Preparing Your Project for Production

Lesson 50/50 | Study Time: 30 Min

Course: AI DevOps on AWS: Automation, CI/CD and Cloud Engineering

Getting code to work in development is very different from running it reliably in production. Production systems handle real users, real data, and real consequences when things go wrong.

Before any system goes live, a structured review across every dimension — security, reliability, performance, cost, and operations is essential.

What Production Readiness Means

A production-ready system is not just one that works. It is one that:

1. Handles failures gracefully without losing data or user trust.

2. Recovers automatically without requiring manual intervention at 3am.

3. Is observable enough that problems are detected before users report them.

4. Is secure enough that sensitive data is protected and access is controlled.

5. Is cost-efficient enough that it does not waste money on idle or oversized resources.

6. Is documented well enough that any team member — not just the person who built it — can operate and troubleshoot it.

Production Readiness Checklist

Work through each area systematically before declaring a system production-ready.

Infrastructure and Architecture

1. Infrastructure is fully defined in IaC — Terraform, CloudFormation, or CDK. No manually created resources in production.

2. Resources are deployed across at least two Availability Zones.

3. No single points of failure exist in the architecture.

4. Auto Scaling is configured with appropriate minimum, maximum, and desired capacity values.

5. An Application Load Balancer distributes traffic across healthy targets with health checks configured.

6. All infrastructure has been reviewed against the AWS Well-Architected Framework.

7. Resources are tagged consistently with environment, team, and application tags.

Security

1. No secrets, credentials, or API keys exist in source code, environment files, or container images.

2. All secrets are stored in AWS Secrets Manager or Parameter Store and retrieved at runtime.

3. IAM roles follow least-privilege — every role has only the permissions it specifically needs.

4. MFA is enabled for all IAM users with console access.

5. The root account has MFA enabled and is not used for daily operations.

6. Security groups follow least-privilege — no ports open to 0.0.0.0/0 that are not required.

7. All data at rest is encrypted using KMS or service-level encryption.

8. All data in transit uses TLS.

9. GuardDuty is enabled in every account and region.

10. AWS Config rules are active and compliance violations have been remediated.

11. Security Hub is enabled and critical findings have been resolved.

12. Container images have been scanned — no critical CVEs in production images.

13. SAST and SCA scans pass in the CI/CD pipeline.

CI/CD Pipeline

Observability

1. CloudWatch dashboards are set up for key metrics — error rates, latency, CPU, memory, and business metrics.

2. CloudWatch Alarms are configured for critical thresholds — error rate, latency, availability.

3. Alarms route to SNS and notify the right people through the right channels.

4. Composite alarms reduce noise — alerts only fire when meaningful combinations of conditions are true.

5. Log groups have retention policies set — logs are not stored indefinitely.

6. Structured JSON logging is in place — logs are searchable and filterable.

7. AWS X-Ray tracing is enabled for Lambda functions and API Gateway.

8. A runbook exists for every critical alarm — on-call engineers know what to do when it fires.

Reliability and Recovery

1. Multi-AZ is enabled for all databases — RDS Multi-AZ or DynamoDB global tables where required.

2. Automated backups are configured for all stateful resources — RDS, DynamoDB, EBS.

3. Backups have been tested — a successful restore from backup has been verified.

4. A disaster recovery strategy has been defined — RTO and RPO are agreed with stakeholders.

5. Route 53 health checks and failover routing are configured for multi-region workloads.

6. Connection draining is configured on the load balancer — in-flight requests complete before instance termination.

7. The system has been tested under failure conditions — instance termination, AZ failure, database failover.

Cost

1. AWS Budgets are configured with alerts at 50%, 80%, and 100% of expected monthly spend.

2. Cost Explorer has been reviewed — no unexpected cost categories or unexplained spikes.

3. Dev and staging environments use smaller, cheaper instance types than production.

4. Spot Instances or Savings Plans are used where appropriate for predictable workloads.

5. No idle resources exist — no running instances or NAT Gateways in unused environments.

6. Auto Scaling ensures capacity scales down during low-traffic periods.

Documentation and Operational Readiness

1. A system architecture diagram exists and is up to date.

2. A README covers how to deploy, how to roll back, and how to run the system locally.

3. Runbooks exist for all common operational scenarios — deployment, rollback, incident response, DR failover.

4. On-call responsibilities are clearly defined — who is paged for what.

5. A post-incident review process is defined and the team knows how to conduct a blameless post-mortem.

6. The team has practised at least one DR scenario — tabletop exercise or actual failover test.

The Day-One vs. Day-Two Mindset

Production readiness is not a one-time gate. It is a mindset.

Day One is getting the system to production correctly — infrastructure as code, CI/CD pipeline, security controls, monitoring, and DR strategy in place from the start.

Day Two is everything that happens after — keeping the system healthy, improving reliability, tightening security, optimising costs, and evolving the architecture as requirements change.

The checklists above are your Day One foundation. Day Two never ends.

Previous Lesson

Drew Collins

Product Designer

Profile

Class Sessions

1- What is DevOps? Principles, Culture, and Practices 2- The DevOps Lifecycle 3- Introduction to Cloud Computing 4- AWS Global Infrastructure 5- Core AWS Services Overview 6- Git Fundamentals 7- Branching Strategies 8- Pull Requests and Code Review Best Practices 9- Integrating Git with AWS CodeCommit and GitHub 10- Managing Secrets and Sensitive Files in Repositories 11- What is CI/CD? 12- Building Pipelines with AWS CodePipeline and CodeBuild 13- Automated Testing in CI 14- Deployment Strategies 15- Using GitHub Actions and Jenkins on AWS 16- Why Infrastructure as Code (IaC)? 17- AWS CloudFormation 18- Terraform on AWS 19- AWS Cloud Development Kit (CDK) 20- IaC Best Practices 21- Docker Fundamentals 22- Amazon ECR 23- Deploying Containers with Amazon ECS 24- Kubernetes Basics and Amazon EKS 25- Integrating Containers into CI/CD Pipelines 26- Serverless Computing Concepts and Use Cases 27- Building and Deploying AWS Lambda Functions 28- Event-Driven Automation with Amazon EventBridge 29- Orchestrating Workflows with AWS Step Functions 30- API Gateway Integration for Serverless APIs 31- Introduction to MLOps 32- Training and Deploying Models with Amazon SageMaker 33- Automating ML Pipelines with SageMaker Pipelines 34- Using Amazon CodeWhisperer and AI Tools for Code Automation 35- AI-Powered Testing, Anomaly Detection, and Incident Prediction 36- Observability Fundamentals 37- Amazon CloudWatch 38- Distributed Tracing with AWS X-Ray 39- Centralised Logging with Amazon OpenSearch Service 40- Setting Up Automated Alerts and Incident Response Workflows 41- Shift-Left Security 42- IAM Roles, Policies, and Least-Privilege Access 43- Static Code Analysis and Vulnerability Scanning in CI/CD 44- AWS Security Hub, GuardDuty, and Config for Compliance 45- Secrets Management with AWS Secrets Manager and Parameter Store 46- AWS Well-Architected Framework 47- Auto Scaling and Elastic Load Balancing for Resilience 48- Cost Monitoring with AWS Cost Explorer and Budgets 49- Disaster Recovery Strategies 50- Preparing Your Project for Production