Selected Case Studies


Here are a few examples of 'DevOps Done Right' in the real world — long-lived, auditable and stable platforms.

Financial Services Client

Overview

My client was a financial services company providing compliance services to banking clients using third-party software. They wanted to expand their offerings by developing their own services in-house. I was engaged to provide greenfield DevOps and Azure infrastructure services to support a new development team and the new services they'd be building.

Challenges

The client was relatively new to Azure, Linux and Java-based microservices. My job was to design and implement a toolset and set of practices to support this startup-like activity, as well as future expansion of the team, the services and geographic availability. All this was within the context of SLAs, compliance, deployment windows and the reality of clients operating within a strict regulatory framework.

Solution

The key to my approach was to deliver DevOps fast, so the new team could start working, while simultaneously incorporating guardrails into every aspect, so that the process would be protected. I designed a modified GitOps development and deployment process and implemented it in Azure DevOps. I used Terraform and Ansible for IaC, delivering self-building infrastructure for multiple platforms, in different environments and regions. I installed and managed Kafka, Mongo and ElasticSearch clusters required to support the new services. I implemented observability for services and infrastructure, using Azure Monitor, Fluentd, Prometheus, Grafana, and Kowl. I developed dashboards for the operations team, designed for early error detection and drill-down to find root causes.

Outcomes

  • Development of runbooks, guardrails, standards
  • Architecture design for multiple platforms
  • Fast builds — < 5 minutes — for fast feedback on commits
  • Fast deploys — < 10 minutes — for bug fixes and new features
  • Lowest number of findings in ISO27001 / SOC2 audits across the company
  • Security-first approach: RBAC, code / container scans, SSL-everywhere, Linux hardening
  • Approval-based automation to support SLAs
  • Average MTTD < 10 minutes
  • Average MTTR < 60 minutes; rollback / restore support
  • Average MTTF > 6 months

Startup Client

Overview

This startup client provided an operating system-like platform for data management in biology labs. This platform was built as a monolithic application using Consul and Nomad. GitHub was used for CI/CD.

Challenges

The core application build was extremely fragile and took hours to finish. Tests were incomplete, particularly when it came to testing the interaction amongst multiple nodes running different operating systems within a cluster. I was brought on to help with these issues.

Solution

I first cleaned up the build, revamping the GitHub pipelines and bringing the build time down. Next I developed a packer-based process for building Ubuntu images with the application and all supporting services. My main task was then in building a pipeline that PXE-booted a cluster of machines running the Ubuntu image; these images where then supposed to run a series of integration tests.

Outcomes

  • Improved build time < 1 hour
  • Mature GitHub pipelines
  • Automated packer builds
  • Pipeline to PXE-build a bare-metal cluster for integration testing
  • Design input to streamline deliverables for customers

Looking for results like these?

Check out my IaC Bare Metal Bootstrap + CI/CD Bootstrap packages.

Book a 15-min triage to find out more

Location Data Company - HERE Technologies

Overview

This Fortune-500 client had a global AWS-based PaaS that provided location data services. The underlying technology was custom Kubernetes clusters running virtual machines. HERE wanted to move to a multicloud model, where both the PaaS and overlying services could be deployed to Azure as well as AWS, and eventually to other cloud vendors like Alibaba. I was brought on for 6 months to help lead the design for an automated, standardized and multi-cloud deployment approach, and to implement IaC for the Azure infrastructure.

Challenges

There were numerous challenges in implementing these objectives. We had to make the build system cloud-agnostic, overcome various limitations in Azure, adhere to stringent security requirements, and standardize service deployments across various global development teams. There was no automated way of provisioning a complete cluster into different environments, no standardized deployment process, cyclical dependencies and numerous AWS-specific steps in deployment scripts. Even worse, some service deployments took up to 2 months and required manual intervention in some stages.

Solution

I defined a phased project plan and architecture for this work and engaged various teams across the company. I designed a custom, automated "One-click Deployment" approach to provision services in a production-ready state, using GitLab, Jenkins and custom components. This process applied to all layers of the stack, from the base Kubernetes cluster through to services and applications. I created the Terraform for all Azure components and used the new deployment approach to provision the base cluster and some services into various environments in an automated fashion. I worked on a global data routing design with external vendors that also included running HA services across multiple clouds.

Outcomes

  • Robust multicloud deployment design with fast deploy times
  • Full IaC for Azure for the base cluster and various services
  • Tooling for governance, cost usage visibility for clients
  • My architectural, technical and process designs still in use today

Looking for results like these?

Check out my IaC VM Cloud Enterprise + CI/CD Scale up packages.

Book a 15-min triage to find out more