Horizontal vs Vertical Scaling for Node.js

Vertical scaling: more resources, same instance

Vertical scaling means giving your existing instances more compute resources — more vCPUs, more RAM, faster storage. It's the path of least resistance: no code changes, no architecture changes, no load balancer setup. You just upgrade the instance size and restart.

For Node.js, there's a critical nuance: additional vCPUs only help if you use cluster mode. A single Node.js process uses exactly one CPU core. Moving from a 2 vCPU to an 8 vCPU machine gives you zero benefit without cluster mode — you're paying for 6 idle cores.

// Without cluster mode: 1 process, 1 CPU core used regardless of machine size
// With cluster mode: one process per core, all CPUs utilized

import cluster from 'cluster';
import { availableParallelism } from 'os';

if (cluster.isPrimary) {
  for (let i = 0; i < availableParallelism(); i++) cluster.fork();
  cluster.on('exit', () => cluster.fork()); // auto-restart on crash
} else {
  startServer(); // Each worker runs the full app
}

Vertical scaling is the right first choice when:

Your database is the bottleneck (more RAM = larger buffer pool = fewer disk reads)
You haven't implemented cluster mode yet (easy 4–8x throughput improvement first)
You need more memory per process for large working sets
Simplicity matters more than cost efficiency at your current scale

Horizontal scaling: more instances, smaller each

Horizontal scaling adds more instances of your application behind a load balancer, distributing traffic across all of them. The ceiling is theoretically unlimited — add more instances and you add more capacity linearly.

The prerequisite: your application must be stateless. Any request can hit any instance. If instance A holds state that instance B doesn't know about, requests will fail randomly.

Horizontal scaling is the right choice when:

You need high availability — multiple instances means a single instance failure doesn't cause downtime
Your traffic patterns require elastic capacity — scale out for daytime traffic spikes, scale in at night
You've hit the vertical ceiling for your use case
Cost efficiency at scale — many small instances are often cheaper per req/s than few large ones

Making your Node.js app stateless

This is the necessary prerequisite for horizontal scaling. Audit your application for these common statefulness issues:

// ✗ In-memory sessions — only instance A has this session
app.use(session({ store: new MemoryStore() }));

// ✓ Redis-backed sessions — any instance can read any session
app.use(session({ store: new RedisStore({ client: redis }) }));

// ✗ In-process rate limiting — users can bypass by hitting different instances
const counts = new Map<string, number>();

// ✓ Redis-backed rate limiting — shared state across all instances
const limiter = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(100, '1m') });

// ✗ File uploads saved to local disk — only available on that instance
app.post('/upload', upload.single('file'), (req, res) => {
  fs.writeFileSync('/tmp/uploads/' + req.file.originalname, req.file.buffer);
});

// ✓ Files in object storage — available to all instances
app.post('/upload', upload.single('file'), async (req, res) => {
  const url = await uploadToS3(req.file.buffer, req.file.originalname);
  res.json({ url });
});

// ✗ Scheduled job assuming single-process execution
cron.schedule('0 * * * *', () => sendHourlyDigests()); // Runs N times with N replicas

// ✓ Distributed lock prevents duplicate execution
cron.schedule('0 * * * *', async () => {
  const lock = await redis.set('cron:hourly-digest', '1', { NX: true, EX: 3600 });
  if (lock) await sendHourlyDigests(); // Only one instance wins the lock
});

The cost comparison

At lower scale, vertical scaling is almost always cheaper due to per-instance overhead (monitoring, logging agents, load balancer capacity units). At higher scale, horizontal scaling typically wins on cost per req/s because smaller instances have better price/performance ratios and unused capacity can scale in.

// Rough cost model (prices vary by provider):
// 1× 16 vCPU / 32GB instance:  $0.60/hr
// 4× 4 vCPU / 8GB instances:   $0.60/hr (same cost, better availability)
// 8× 2 vCPU / 4GB instances:   $0.48/hr (cheaper, easier to scale elastically)

The practical decision framework

Start with vertical scaling — If you haven't hit a ceiling, the simplest option is the right option.
Add cluster mode before upgrading instance size — Free throughput increase before paying for more hardware.
Move to horizontal before you need high availability — Running multiple instances means a single failure doesn't cause downtime. This is often the first reason to scale out, even before traffic requires it.
At 5+ instances, revisit instance size — Smaller instances with better auto-scaling policies often provide better cost/performance.