Auto Scaling and Elastic Load Balancing for Resilience

Lesson 47/50 | Study Time: 40 Min

Course: AI DevOps on AWS: Automation, CI/CD and Cloud Engineering

Traffic is never perfectly predictable. A marketing campaign drives a sudden spike. A viral post sends unexpected load. Night-time traffic drops to a fraction of daytime levels.

Building an application that handles all of these scenarios, without over-provisioning expensive resources during quiet periods or under-provisioning during peaks.

It requires two capabilities working together: Auto Scaling to adjust capacity automatically, and Elastic Load Balancing to distribute traffic across that capacity. Together they are the foundation of resilient, cost-efficient applications on AWS.

Elastic Load Balancing

A load balancer sits in front of your application and distributes incoming traffic across multiple instances, containers, or functions. No single resource gets overwhelmed, and if one becomes unhealthy, the load balancer stops sending traffic to it automatically.

The Three Load Balancer Types

1. Application Load Balancer — ALB

Operates at Layer 7 — the HTTP/HTTPS level. It understands the content of requests and can route based on URL path, hostname, HTTP headers, and query strings.

This makes it the right choice for web applications, REST APIs, and microservices where different paths route to different backend services.

1. Routes /api/* to one target group and /static/* to another.

2. Supports WebSockets and HTTP/2.

3. Integrates with AWS WAF for application-layer security.

4. The most commonly used load balancer for modern applications.

2. Network Load Balancer — NLB

Operates at Layer 4 — the TCP/UDP level. It handles millions of requests per second with ultra-low latency.

Use it when performance is the primary requirement and you do not need content-based routing. Common for real-time gaming, financial trading systems, and IoT applications.

3. Gateway Load Balancer — GWLB

Designed for deploying, scaling, and managing third-party network appliances — firewalls, intrusion detection systems. Primarily used by network security teams rather than application teams.

Target Groups

A target group is a collection of resources that a load balancer routes traffic to — EC2 instances, ECS tasks, Lambda functions, or IP addresses. The load balancer performs health checks on each target and only routes traffic to healthy ones.

Health check configuration matters. Set the health check path, interval, and threshold deliberately. A health check that is too aggressive removes healthy targets unnecessarily. A health check that is too lenient sends traffic to unhealthy targets for too long.

Auto Scaling

Auto Scaling automatically adjusts the number of running instances or tasks based on demand. When traffic increases, more capacity is added. When traffic decreases, excess capacity is removed. You pay for what you need, when you need it.

EC2 Auto Scaling

EC2 Auto Scaling manages a group of EC2 instances, called an Auto Scaling Group (ASG). You define three capacity values:

Scaling Policies

Auto Scaling uses policies to decide when and how to scale:

1. Target Tracking Scaling — Recommended: You define a target value for a metric — keep CPU utilisation at 60%, keep the number of requests per target at 1,000. Auto Scaling continuously adjusts capacity to maintain that target. This is the simplest and most effective policy for most workloads.

2. Step Scaling: Define thresholds that trigger different scaling actions. If CPU exceeds 70% add 2 instances. If CPU exceeds 90% add 5 instances. More granular than target tracking but requires more configuration.

3. Scheduled Scaling: Scale on a defined schedule, add capacity at 8am Monday to Friday when office workers come online, reduce it at 8pm. Use this when you know your traffic patterns in advance.

4. Predictive Scaling: Uses machine learning to forecast future traffic based on historical patterns and pre-provisions capacity before demand arrives. Eliminates the lag between a traffic spike arriving and reactive scaling completing.

ECS Auto Scaling

ECS services scale tasks, not instances using Application Auto Scaling. The same policy types apply — target tracking, step scaling, and scheduled scaling — but they control the number of running tasks rather than EC2 instances.

For Fargate, task scaling is all you need — AWS manages the underlying infrastructure. For EC2 launch type, you need both task scaling and cluster scaling via an ASG.

How They Work Together

ALB and Auto Scaling integrate natively. When Auto Scaling adds a new instance or ECS task, it registers it with the load balancer automatically. When it removes one, it deregisters it after completing in-flight requests before termination.

Traffic increases

│

▼

ALB distributes load across existing targets

│

▼

Auto Scaling detects metric threshold breached

│

▼

New instances or tasks launched

│

▼

ALB registers new targets — health checks pass

│

▼

Traffic distributed across expanded capacity

│

▼

Traffic decreases — Auto Scaling removes excess capacity

│

▼

ALB deregisters removed targets gracefully

The entire cycle is automatic. No human involvement needed.

Resilience Patterns

1. Multi-AZ deployment: Always deploy your Auto Scaling Group or ECS service across at least two Availability Zones. If one AZ has an outage, the load balancer routes traffic to instances in the remaining AZs automatically. Your application stays available.

2. Connection draining: When Auto Scaling removes an instance, the load balancer waits for existing connections to complete before the instance is terminated. This prevents in-flight requests from being cut off during scale-in events.

3. Warm-up period: Newly launched instances need time to initialise before they can handle traffic effectively. Configure a warm-up period in your scaling policy so new instances are not counted as healthy until they are ready — preventing premature traffic routing to cold instances.

Cost Efficiency Through Scaling

Auto Scaling does not just improve resilience, it directly reduces cost. Without it, you provision for peak load at all times — paying for maximum capacity even during off-peak hours. With Auto Scaling, you pay only for the capacity you currently need.

Combine Auto Scaling with:

1. Spot Instances for EC2-based workloads — up to 90% cheaper than On-Demand. Configure your ASG to use a mix of On-Demand and Spot Instances for cost-efficient resilience.

2. Savings Plans for the baseline capacity that always runs — commit to a minimum level of usage for significant discounts.

Previous Lesson Next Lesson

Drew Collins

Product Designer

Profile

Class Sessions

1- What is DevOps? Principles, Culture, and Practices 2- The DevOps Lifecycle 3- Introduction to Cloud Computing 4- AWS Global Infrastructure 5- Core AWS Services Overview 6- Git Fundamentals 7- Branching Strategies 8- Pull Requests and Code Review Best Practices 9- Integrating Git with AWS CodeCommit and GitHub 10- Managing Secrets and Sensitive Files in Repositories 11- What is CI/CD? 12- Building Pipelines with AWS CodePipeline and CodeBuild 13- Automated Testing in CI 14- Deployment Strategies 15- Using GitHub Actions and Jenkins on AWS 16- Why Infrastructure as Code (IaC)? 17- AWS CloudFormation 18- Terraform on AWS 19- AWS Cloud Development Kit (CDK) 20- IaC Best Practices 21- Docker Fundamentals 22- Amazon ECR 23- Deploying Containers with Amazon ECS 24- Kubernetes Basics and Amazon EKS 25- Integrating Containers into CI/CD Pipelines 26- Serverless Computing Concepts and Use Cases 27- Building and Deploying AWS Lambda Functions 28- Event-Driven Automation with Amazon EventBridge 29- Orchestrating Workflows with AWS Step Functions 30- API Gateway Integration for Serverless APIs 31- Introduction to MLOps 32- Training and Deploying Models with Amazon SageMaker 33- Automating ML Pipelines with SageMaker Pipelines 34- Using Amazon CodeWhisperer and AI Tools for Code Automation 35- AI-Powered Testing, Anomaly Detection, and Incident Prediction 36- Observability Fundamentals 37- Amazon CloudWatch 38- Distributed Tracing with AWS X-Ray 39- Centralised Logging with Amazon OpenSearch Service 40- Setting Up Automated Alerts and Incident Response Workflows 41- Shift-Left Security 42- IAM Roles, Policies, and Least-Privilege Access 43- Static Code Analysis and Vulnerability Scanning in CI/CD 44- AWS Security Hub, GuardDuty, and Config for Compliance 45- Secrets Management with AWS Secrets Manager and Parameter Store 46- AWS Well-Architected Framework 47- Auto Scaling and Elastic Load Balancing for Resilience 48- Cost Monitoring with AWS Cost Explorer and Budgets 49- Disaster Recovery Strategies 50- Preparing Your Project for Production