Vertical scaling: more resources, same instance
Vertical scaling means giving your existing instances more compute resources — more vCPUs, more RAM, faster storage. It's the path of least resistance: no code changes, no architecture changes, no load balancer setup. You just upgrade the instance size and restart.
For Node.js, there's a critical nuance: additional vCPUs only help if you use cluster mode. A single Node.js process uses exactly one CPU core. Moving from a 2 vCPU to an 8 vCPU machine gives you zero benefit without cluster mode — you're paying for 6 idle cores.
// Without cluster mode: 1 process, 1 CPU core used regardless of machine size
// With cluster mode: one process per core, all CPUs utilized
import cluster from 'cluster';
import { availableParallelism } from 'os';
if (cluster.isPrimary) {
for (let i = 0; i < availableParallelism(); i++) cluster.fork();
cluster.on('exit', () => cluster.fork()); // auto-restart on crash
} else {
startServer(); // Each worker runs the full app
}Vertical scaling is the right first choice when:
- Your database is the bottleneck (more RAM = larger buffer pool = fewer disk reads)
- You haven't implemented cluster mode yet (easy 4–8x throughput improvement first)
- You need more memory per process for large working sets
- Simplicity matters more than cost efficiency at your current scale
Horizontal scaling: more instances, smaller each
Horizontal scaling adds more instances of your application behind a load balancer, distributing traffic across all of them. The ceiling is theoretically unlimited — add more instances and you add more capacity linearly.
The prerequisite: your application must be stateless. Any request can hit any instance. If instance A holds state that instance B doesn't know about, requests will fail randomly.
Horizontal scaling is the right choice when:
- You need high availability — multiple instances means a single instance failure doesn't cause downtime
- Your traffic patterns require elastic capacity — scale out for daytime traffic spikes, scale in at night
- You've hit the vertical ceiling for your use case
- Cost efficiency at scale — many small instances are often cheaper per req/s than few large ones
Making your Node.js app stateless
This is the necessary prerequisite for horizontal scaling. Audit your application for these common statefulness issues:
// ✗ In-memory sessions — only instance A has this session
app.use(session({ store: new MemoryStore() }));
// ✓ Redis-backed sessions — any instance can read any session
app.use(session({ store: new RedisStore({ client: redis }) }));
// ✗ In-process rate limiting — users can bypass by hitting different instances
const counts = new Map<string, number>();
// ✓ Redis-backed rate limiting — shared state across all instances
const limiter = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(100, '1m') });
// ✗ File uploads saved to local disk — only available on that instance
app.post('/upload', upload.single('file'), (req, res) => {
fs.writeFileSync('/tmp/uploads/' + req.file.originalname, req.file.buffer);
});
// ✓ Files in object storage — available to all instances
app.post('/upload', upload.single('file'), async (req, res) => {
const url = await uploadToS3(req.file.buffer, req.file.originalname);
res.json({ url });
});
// ✗ Scheduled job assuming single-process execution
cron.schedule('0 * * * *', () => sendHourlyDigests()); // Runs N times with N replicas
// ✓ Distributed lock prevents duplicate execution
cron.schedule('0 * * * *', async () => {
const lock = await redis.set('cron:hourly-digest', '1', { NX: true, EX: 3600 });
if (lock) await sendHourlyDigests(); // Only one instance wins the lock
});The cost comparison
At lower scale, vertical scaling is almost always cheaper due to per-instance overhead (monitoring, logging agents, load balancer capacity units). At higher scale, horizontal scaling typically wins on cost per req/s because smaller instances have better price/performance ratios and unused capacity can scale in.
// Rough cost model (prices vary by provider):
// 1× 16 vCPU / 32GB instance: $0.60/hr
// 4× 4 vCPU / 8GB instances: $0.60/hr (same cost, better availability)
// 8× 2 vCPU / 4GB instances: $0.48/hr (cheaper, easier to scale elastically)The practical decision framework
- Start with vertical scaling — If you haven't hit a ceiling, the simplest option is the right option.
- Add cluster mode before upgrading instance size — Free throughput increase before paying for more hardware.
- Move to horizontal before you need high availability — Running multiple instances means a single failure doesn't cause downtime. This is often the first reason to scale out, even before traffic requires it.
- At 5+ instances, revisit instance size — Smaller instances with better auto-scaling policies often provide better cost/performance.