Start with your bill, not your code
Most cost optimization efforts start in the wrong place. Engineers reach for code changes — micro-optimizations, refactors, new caching layers — without understanding where their money actually goes. Before writing a single line, open your cloud provider's cost explorer and break down the bill by service category.
For a typical Node.js SaaS application, costs break down roughly like this:
- 40–60%: Compute (containers, VMs, serverless)
- 20–35%: Database (managed Postgres, MySQL, or equivalent)
- 10–20%: Data transfer (outbound bandwidth is expensive on every major cloud)
- 5–10%: Storage, CDN, monitoring, and miscellaneous services
Optimize in this order. Reducing compute by 30% saves more than eliminating storage costs entirely.
Right-size your containers: the biggest win
Container over-provisioning is the most common and most expensive mistake. Teams spin up 2 vCPU / 4GB containers "just in case" and the app uses 0.2 vCPU and 300MB in steady state. You're paying for 90% idle capacity.
The correct approach: measure first, size second.
# Check actual CPU and memory utilization over a week of production traffic
# In Docker:
docker stats --no-stream
# With Prometheus/Grafana, query:
# container_cpu_usage_seconds_total
# container_memory_usage_bytes
# Target utilization thresholds:
# CPU: 40-60% at peak (leave headroom for traffic spikes)
# Memory: under 70% of container limitIf your p95 CPU utilization over the past week is 25%, you can safely halve your CPU allocation. If memory usage peaks at 280MB on a 512MB container, you have headroom. Down-size one tier, run for a week, down-size again if metrics allow. Compounding savings: halving compute cuts the compute bill in half plus reduces image pull times and cold start times.
Database connection pooling: PgBouncer
Managed Postgres instances are priced by connection count and compute tier. Without a connection pooler, your connection math looks like this: 10 container replicas × pool size of 10 = 100 permanent Postgres connections at all times. With 20 replicas, that's 200 connections — each consuming ~5MB RAM on the database server and contributing to autovacuum contention.
PgBouncer multiplexes many application connections to a small number of real database connections. 100 app connections can be served by 10 real Postgres connections. This lets you run a smaller (cheaper) database tier while handling the same application load.
# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
pool_mode = transaction # Best for stateless apps
max_client_conn = 1000
default_pool_size = 20 # Real connections to Postgres
reserve_pool_size = 5
server_lifetime = 3600
server_idle_timeout = 600
# Your app connects to PgBouncer:6432 instead of Postgres:5432
DATABASE_URL=postgres://user:pass@pgbouncer:6432/myapp?pgbouncer=trueCaching: the highest-ROI optimization
Every database query you avoid is money saved twice — once in database compute, once in database connection time. The math compounds fast: a query that runs 1,000 times per second with a 5-minute cache serves the cache 299,999 out of every 300,000 times, eliminating 99.9% of those queries.
import { LRUCache } from 'lru-cache';
// In-process cache for truly static data (feature flags, plans, config)
// Invalidated on deploy
const processCache = new LRUCache<string, any>({
max: 500,
ttl: 5 * 60 * 1000, // 5 minute TTL
});
// Redis cache for data shared across replicas
async function getOrCache<T>(
key: string,
loader: () => Promise<T>,
ttlSeconds: number
): Promise<T> {
// Try L1 (in-process)
const l1 = processCache.get(key);
if (l1) return l1;
// Try L2 (Redis)
const l2 = await redis.get(key);
if (l2) {
const parsed = JSON.parse(l2);
processCache.set(key, parsed);
return parsed;
}
// Load from database
const value = await loader();
await redis.setEx(key, ttlSeconds, JSON.stringify(value));
processCache.set(key, value);
return value;
}
// Usage:
const plans = await getOrCache('billing:plans', () => prisma.plan.findMany(), 300);Data transfer costs: compression and selective fields
Outbound bandwidth is one of the most overlooked cost line items. Most cloud providers charge $0.08–$0.12 per GB outbound. An API that serves 1TB of data per month pays $80–$120 just in egress fees.
The primary lever: compression. JSON compresses at ratios of 5:1 to 10:1 with gzip. A 100KB API response becomes 10–20KB. Enabling compression at the application or load balancer level can reduce bandwidth costs by 60–80% with minimal CPU overhead.
The secondary lever: don't send unused fields. If your mobile client uses 8 of the 50 fields your API returns, you're paying to transfer and compress the 42 unused fields on every request.
// Only return fields the client actually needs
app.get('/projects', async (req, res) => {
const projects = await prisma.project.findMany({
where: { userId: req.user.id },
select: {
id: true,
name: true,
status: true,
updatedAt: true,
// NOT: all 20+ columns, large JSON metadata fields, etc.
},
});
res.json(projects);
});Auto-scaling for bursty traffic
If your traffic follows predictable patterns — busy during business hours, quiet at night and on weekends — you're paying for peak capacity 24/7. Most container platforms support metric-based auto-scaling that scales down to minimum instances during off-peak periods.
# Kubernetes HPA (example)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2 # Never scale below 2 (for availability)
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
averageUtilization: 60 # Scale up when CPU > 60%, down when < 60%
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 25 # Scale down by at most 25% at a timeDatabase query cost: indexes and EXPLAIN ANALYZE
Slow queries on managed databases don't just affect latency — they consume more database CPU, hold locks longer, and can cause the database to autoscale up to a more expensive tier. A missing index is a billing problem as much as a performance problem.
-- Find your slowest queries:
SELECT query, calls, total_exec_time / calls AS avg_ms, rows
FROM pg_stat_statements
ORDER BY avg_ms DESC
LIMIT 20;
-- Diagnose a slow query:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM projects
WHERE user_id = 'usr_123' AND status = 'active'
ORDER BY created_at DESC;
-- If you see "Seq Scan" on a large table, you need an index:
CREATE INDEX CONCURRENTLY idx_projects_user_status_created
ON projects (user_id, status, created_at DESC);Summary: the cost reduction checklist
- Profile actual container utilization before changing anything
- Right-size containers based on p95 CPU and memory over 7 days
- Add PgBouncer if running more than 5 container replicas
- Enable gzip/brotli compression on all JSON responses
- Implement Redis caching for data that doesn't change on every request
- Use
selectin ORM queries to avoid fetching unused columns - Enable auto-scaling with a minimum of 2 replicas
- Run EXPLAIN ANALYZE on your top-10 slowest queries and add missing indexes