Follow this blog

Software engineering, design, and psychology

Auto-Scaling Groups and Load Balancers | Microservice Architecture — Ep. 25

The purpose of microservice architecture is durability and scalability. Durability means a service continues to serve requests during failures or peak traffic. Scalability means achieving durability with the least resources possible, making the service cost-efficient.

Most modern microservices typically run on small virtual machines with limited vCPU and RAM. These instances are cheap, but still powerful enough to handle thousands of requests per minute.

However, running a single instance of a service is never enough, as hardware failures are rare but inevitable — even in the best data centers. To maintain durability, at least two instances of a service must run in different data centers (“availability zones” in AWS terminology).

Another issue is traffic spikes. When load increases, a single instance becomes incapable of handling all requests, so multiple instances become a necessity.

In practice, service load changes dynamically, e. g. low at night and ten times higher of that during business hours. For cost efficiency, services are usually placed in auto-scaling groups. A dedicated agent monitors metrics such as CPU usage or request rate and automatically adds or removes instances as load surpasses predefined thresholds. Scaling achieved through changes in a number of small instances is called horizontal scaling.

Horizontal scaling introduces another challenge: how do clients know which instance to talk to?

An API gateway routes requests to services, but it should not track which instances currently exist or which ones are healthy. That responsibility is delegated to load balancers — lightweight proxy services that:

  • track active service instances and their IP addresses
  • monitor instance health and availability
  • distribute load evenly across instances

As load balancers expose a stable endpoint (often a static IP or DNS name), API gateways can simply route requests to them — offloading instance discovery and traffic distribution to the combined work of load balancers and auto-scaling groups.

Follow this blog
Send
Share