The client was a Series A SaaS company with 40 engineers and a 4-year-old deployment process that had not changed since the company was five people. Releases happened bi-weekly and required one of two senior engineers to be available to run a 2-hour manual checklist. Both had taken on informal on-call responsibility, meaning neither could take vacation without coverage anxiety.
AWS costs were growing 40 percent year-over-year despite flat headcount. The engineering team knew the infrastructure was inefficient but had no visibility into where the spend was going. Three production incidents in the previous six months had each taken four or more hours to resolve, in two cases requiring all-hands debugging on infrastructure components that only the most senior engineers understood.
We started with a two-week assessment that documented the full infrastructure state, identified the 11 AWS resources that accounted for 70 percent of the monthly bill, and mapped the deployment process end-to-end. The assessment became the roadmap.
Over the following eight weeks, we containerised the application with Docker, moved workloads to Kubernetes on AWS EKS, and rebuilt the deployment pipeline in GitHub Actions with full staging and production environments. Terraform was written for all infrastructure that previously existed only as manual AWS console configurations.
Prometheus and Grafana replaced the ad-hoc monitoring setup. PagerDuty was configured with tiered alerting so on-call incidents went to the right person rather than waking the whole team. AWS cost optimisation ran in parallel: right-sizing instances, moving non-critical workloads to Spot, and implementing auto-scaling for traffic peaks cut the monthly bill by 38 percent within 90 days.