All posts
ProductionMay 28, 2026·6 min read

Reducing Cloud Costs for Node.js Apps

Right-sizing containers, connection pooling, caching strategies, and the three billing line items that surprise most teams. How to cut your cloud bill by 40% without touching your code.

Start with your bill, not your code

Most cost optimization efforts start in the wrong place. Engineers reach for code changes — micro-optimizations, refactors, new caching layers — without understanding where their money actually goes. Before writing a single line, open your cloud provider's cost explorer and break down the bill by service category.

For a typical Node.js SaaS application, costs break down roughly like this:

  • 40–60%: Compute (containers, VMs, serverless)
  • 20–35%: Database (managed Postgres, MySQL, or equivalent)
  • 10–20%: Data transfer (outbound bandwidth is expensive on every major cloud)
  • 5–10%: Storage, CDN, monitoring, and miscellaneous services

Optimize in this order. Reducing compute by 30% saves more than eliminating storage costs entirely.

Right-size your containers: the biggest win

Container over-provisioning is the most common and most expensive mistake. Teams spin up 2 vCPU / 4GB containers "just in case" and the app uses 0.2 vCPU and 300MB in steady state. You're paying for 90% idle capacity.

The correct approach: measure first, size second.

# Check actual CPU and memory utilization over a week of production traffic
# In Docker:
docker stats --no-stream

# With Prometheus/Grafana, query:
# container_cpu_usage_seconds_total
# container_memory_usage_bytes

# Target utilization thresholds:
# CPU: 40-60% at peak (leave headroom for traffic spikes)
# Memory: under 70% of container limit

If your p95 CPU utilization over the past week is 25%, you can safely halve your CPU allocation. If memory usage peaks at 280MB on a 512MB container, you have headroom. Down-size one tier, run for a week, down-size again if metrics allow. Compounding savings: halving compute cuts the compute bill in half plus reduces image pull times and cold start times.

Database connection pooling: PgBouncer

Managed Postgres instances are priced by connection count and compute tier. Without a connection pooler, your connection math looks like this: 10 container replicas × pool size of 10 = 100 permanent Postgres connections at all times. With 20 replicas, that's 200 connections — each consuming ~5MB RAM on the database server and contributing to autovacuum contention.

PgBouncer multiplexes many application connections to a small number of real database connections. 100 app connections can be served by 10 real Postgres connections. This lets you run a smaller (cheaper) database tier while handling the same application load.

# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp

[pgbouncer]
listen_port = 6432
pool_mode = transaction       # Best for stateless apps
max_client_conn = 1000
default_pool_size = 20        # Real connections to Postgres
reserve_pool_size = 5
server_lifetime = 3600
server_idle_timeout = 600

# Your app connects to PgBouncer:6432 instead of Postgres:5432
DATABASE_URL=postgres://user:pass@pgbouncer:6432/myapp?pgbouncer=true

Caching: the highest-ROI optimization

Every database query you avoid is money saved twice — once in database compute, once in database connection time. The math compounds fast: a query that runs 1,000 times per second with a 5-minute cache serves the cache 299,999 out of every 300,000 times, eliminating 99.9% of those queries.

import { LRUCache } from 'lru-cache';

// In-process cache for truly static data (feature flags, plans, config)
// Invalidated on deploy
const processCache = new LRUCache<string, any>({
  max: 500,
  ttl: 5 * 60 * 1000, // 5 minute TTL
});

// Redis cache for data shared across replicas
async function getOrCache<T>(
  key: string,
  loader: () => Promise<T>,
  ttlSeconds: number
): Promise<T> {
  // Try L1 (in-process)
  const l1 = processCache.get(key);
  if (l1) return l1;

  // Try L2 (Redis)
  const l2 = await redis.get(key);
  if (l2) {
    const parsed = JSON.parse(l2);
    processCache.set(key, parsed);
    return parsed;
  }

  // Load from database
  const value = await loader();
  await redis.setEx(key, ttlSeconds, JSON.stringify(value));
  processCache.set(key, value);
  return value;
}

// Usage:
const plans = await getOrCache('billing:plans', () => prisma.plan.findMany(), 300);

Data transfer costs: compression and selective fields

Outbound bandwidth is one of the most overlooked cost line items. Most cloud providers charge $0.08–$0.12 per GB outbound. An API that serves 1TB of data per month pays $80–$120 just in egress fees.

The primary lever: compression. JSON compresses at ratios of 5:1 to 10:1 with gzip. A 100KB API response becomes 10–20KB. Enabling compression at the application or load balancer level can reduce bandwidth costs by 60–80% with minimal CPU overhead.

The secondary lever: don't send unused fields. If your mobile client uses 8 of the 50 fields your API returns, you're paying to transfer and compress the 42 unused fields on every request.

// Only return fields the client actually needs
app.get('/projects', async (req, res) => {
  const projects = await prisma.project.findMany({
    where: { userId: req.user.id },
    select: {
      id: true,
      name: true,
      status: true,
      updatedAt: true,
      // NOT: all 20+ columns, large JSON metadata fields, etc.
    },
  });
  res.json(projects);
});

Auto-scaling for bursty traffic

If your traffic follows predictable patterns — busy during business hours, quiet at night and on weekends — you're paying for peak capacity 24/7. Most container platforms support metric-based auto-scaling that scales down to minimum instances during off-peak periods.

# Kubernetes HPA (example)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2      # Never scale below 2 (for availability)
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 60  # Scale up when CPU > 60%, down when < 60%
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 25                # Scale down by at most 25% at a time

Database query cost: indexes and EXPLAIN ANALYZE

Slow queries on managed databases don't just affect latency — they consume more database CPU, hold locks longer, and can cause the database to autoscale up to a more expensive tier. A missing index is a billing problem as much as a performance problem.

-- Find your slowest queries:
SELECT query, calls, total_exec_time / calls AS avg_ms, rows
FROM pg_stat_statements
ORDER BY avg_ms DESC
LIMIT 20;

-- Diagnose a slow query:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM projects
WHERE user_id = 'usr_123' AND status = 'active'
ORDER BY created_at DESC;

-- If you see "Seq Scan" on a large table, you need an index:
CREATE INDEX CONCURRENTLY idx_projects_user_status_created
ON projects (user_id, status, created_at DESC);

Summary: the cost reduction checklist

  • Profile actual container utilization before changing anything
  • Right-size containers based on p95 CPU and memory over 7 days
  • Add PgBouncer if running more than 5 container replicas
  • Enable gzip/brotli compression on all JSON responses
  • Implement Redis caching for data that doesn't change on every request
  • Use select in ORM queries to avoid fetching unused columns
  • Enable auto-scaling with a minimum of 2 replicas
  • Run EXPLAIN ANALYZE on your top-10 slowest queries and add missing indexes

Ready to put this into practice?

Deploy your Node.js app to production in minutes — zero YAML, automatic CI/CD, and HTTPS included.