AWS Well-Architected Framework

Lesson 46/50 | Study Time: 30 Min

Course: AI DevOps on AWS: Automation, CI/CD and Cloud Engineering

Building something that works is one thing. Building something that works reliably, securely, efficiently, and cost-effectively at scale is another.

The AWS Well-Architected Framework is a set of guiding principles, organised into five pillars that helps teams evaluate and improve their cloud architectures.

It is not a checklist you complete once. It is a continuous discipline that guides every architectural decision you make on AWS.

What is the Well-Architected Framework?

The Well-Architected Framework was developed by AWS based on years of reviewing thousands of customer architectures.

It codifies the patterns that lead to successful cloud systems and the anti-patterns that cause failures, security breaches, poor performance, and unnecessary cost.

AWS provides a Well-Architected Tool, a free service in the AWS Console that walks you through a series of questions about your architecture and identifies risks and improvement opportunities against the five pillars.

The Five Pillars

Pillar 1 — Operational Excellence

Operational excellence is about running and monitoring systems effectively and continuously improving processes and procedures.

Key Principles:

1. Perform operations as code — automate runbooks, deployments, and infrastructure management rather than relying on manual steps.

2. Make frequent, small, reversible changes — small deployments are easier to diagnose and roll back than large ones.

3. Anticipate failure — design for failure, test failure scenarios regularly, and learn from every incident.

4. Learn from operational events — every incident, near-miss, and alert is an opportunity to improve.

In Practice: A mature CI/CD pipeline, automated runbooks in Systems Manager, blameless post-mortems, and regular game days — where the team intentionally causes failures to test their response, are all signs of operational excellence.

Pillar 2 — Security

Security is about protecting information, systems, and assets while delivering business value.

Key Principles:

1. Implement a strong identity foundation — use IAM roles, enforce least privilege, enable MFA, and never share credentials.

2. Enable traceability — log every action, monitor all resources, and respond automatically to security events.

3. Apply security at every layer — network, compute, application, and data all need independent security controls.

4. Protect data in transit and at rest — encrypt everything using KMS, TLS, and service-level encryption features.

5. Automate security best practices — use Config rules, Security Hub, and GuardDuty to enforce and monitor security continuously.

In Practice: Shift-left security, IAM least privilege, Security Hub, GuardDuty, secrets management — directly serves this pillar.

Pillar 3 — Reliability

Reliability is about ensuring a system performs its intended function correctly and consistently, recovering automatically from failures.

In Practice: Multi-AZ deployments, ECS service auto-recovery, RDS Multi-AZ, Route 53 health checks, and regular chaos engineering exercises demonstrate reliability focus.

Pillar 4 — Performance Efficiency

Performance efficiency is about using computing resources efficiently to meet requirements and maintaining that efficiency as demand changes.

Key Principles:

1. Use the right tool for the job — choose the appropriate service and instance type for each workload. A memory-intensive workload belongs on an r-family EC2 instance, not a t-family.

2. Go serverless where possible — Lambda and Fargate remove infrastructure management overhead and scale automatically.

3. Use caching aggressively — Amazon ElastiCache, CloudFront, and API Gateway caching reduce latency and backend load.

4. Monitor performance continuously — use CloudWatch metrics and X-Ray traces to identify bottlenecks and optimise proactively.

5. Experiment with new services — AWS regularly releases new services and features that can improve performance. Stay current.

In Practice: Right-sizing EC2 instances, using CloudFront for content delivery, caching database results with ElastiCache, and using X-Ray to identify slow service calls all serve this pillar.

Pillar 5 — Cost Optimisation

Cost optimisation is about running systems to deliver business value at the lowest possible cost.

Key Principles:

1. Adopt a consumption model — pay only for what you use. Stop resources when they are not needed. Use serverless and spot instances where appropriate.

2. Measure overall efficiency — understand the cost per unit of business output and track it over time.

3. Avoid unnecessary expense — eliminate idle resources, right-size instances, use Reserved Instances or Savings Plans for predictable workloads.

4. Use managed services — offloading undifferentiated heavy lifting to AWS managed services is often cheaper than self-managing equivalent infrastructure.

5. Analyse and attribute expenditure — use AWS Cost Explorer and tagging to understand where money is being spent and hold teams accountable for their costs.

In Practice: Savings Plans for production workloads, Spot Instances for CI/CD build agents, S3 Intelligent-Tiering for storage, and regular Cost Explorer reviews prevent waste from accumulating silently.

Using the Well-Architected Tool

The Well-Architected Tool in the AWS Console guides you through a structured review of your architecture against all five pillars.

You answer questions about how your system is built and operated, and it identifies high-risk issues and improvement recommendations.

Run a Well-Architected review at three key moments, when designing a new system before building it, after a significant incident to identify what the architecture could do better, and periodically for existing systems to catch drift from best practices.

Previous Lesson Next Lesson

Drew Collins

Product Designer

Profile

Class Sessions

1- What is DevOps? Principles, Culture, and Practices 2- The DevOps Lifecycle 3- Introduction to Cloud Computing 4- AWS Global Infrastructure 5- Core AWS Services Overview 6- Git Fundamentals 7- Branching Strategies 8- Pull Requests and Code Review Best Practices 9- Integrating Git with AWS CodeCommit and GitHub 10- Managing Secrets and Sensitive Files in Repositories 11- What is CI/CD? 12- Building Pipelines with AWS CodePipeline and CodeBuild 13- Automated Testing in CI 14- Deployment Strategies 15- Using GitHub Actions and Jenkins on AWS 16- Why Infrastructure as Code (IaC)? 17- AWS CloudFormation 18- Terraform on AWS 19- AWS Cloud Development Kit (CDK) 20- IaC Best Practices 21- Docker Fundamentals 22- Amazon ECR 23- Deploying Containers with Amazon ECS 24- Kubernetes Basics and Amazon EKS 25- Integrating Containers into CI/CD Pipelines 26- Serverless Computing Concepts and Use Cases 27- Building and Deploying AWS Lambda Functions 28- Event-Driven Automation with Amazon EventBridge 29- Orchestrating Workflows with AWS Step Functions 30- API Gateway Integration for Serverless APIs 31- Introduction to MLOps 32- Training and Deploying Models with Amazon SageMaker 33- Automating ML Pipelines with SageMaker Pipelines 34- Using Amazon CodeWhisperer and AI Tools for Code Automation 35- AI-Powered Testing, Anomaly Detection, and Incident Prediction 36- Observability Fundamentals 37- Amazon CloudWatch 38- Distributed Tracing with AWS X-Ray 39- Centralised Logging with Amazon OpenSearch Service 40- Setting Up Automated Alerts and Incident Response Workflows 41- Shift-Left Security 42- IAM Roles, Policies, and Least-Privilege Access 43- Static Code Analysis and Vulnerability Scanning in CI/CD 44- AWS Security Hub, GuardDuty, and Config for Compliance 45- Secrets Management with AWS Secrets Manager and Parameter Store 46- AWS Well-Architected Framework 47- Auto Scaling and Elastic Load Balancing for Resilience 48- Cost Monitoring with AWS Cost Explorer and Budgets 49- Disaster Recovery Strategies 50- Preparing Your Project for Production

AWS Well-Architected Framework

Drew Collins

Class Sessions

Sales Campaign