FinOps in Practice: Cutting Cloud Costs Without Slowing Teams
Rightsizing, Savings Plans, allocation tags and governance: a pragmatic FinOps playbook for AWS, GCP and Azure that engineering teams will actually adopt.
Cloud bills rarely explode because of a single bad decision. They drift upward through hundreds of small ones: an oversized RDS instance kept "just in case", a forgotten GKE cluster in a sandbox project, a Log Analytics workspace ingesting verbose debug logs at 2.30 €/GB. FinOps is the discipline that turns this drift into a feedback loop between Finance, Engineering and Product. Done well, we typically see clients reduce their AWS/GCP/Azure spend by 20–35% in the first six months without freezing roadmaps.
Here is the playbook we apply at DCT, structured around the FinOps Foundation's three phases: Inform, Optimize, Operate.
1. Inform: you can't optimize what you can't allocate
The first blocker is almost always allocation. If 40% of your spend lands in an untagged bucket, no team will own it.
Enforce a minimal tagging contract from day one — three tags are usually enough:
cost-center(who pays)environment(prod / staging / dev)application(what it runs)
Make it non-negotiable using policy-as-code:
# Terraform + AWS provider default_tags
provider "aws" {
region = "eu-west-3"
default_tags {
tags = {
cost-center = var.cost_center
environment = var.env
application = var.app
managed-by = "terraform"
}
}
}
Block non-compliant resources with AWS Service Control Policies, Azure Policy (Require a tag and its value) or GCP Organization Policies combined with Config Validator. On the visualisation side, three tools cover most needs:
| Tool | Strengths | Typical use | |---|---|---| | AWS Cost Explorer + CUR | Native, hourly granularity | Single-cloud AWS shops | | GCP Billing export to BigQuery | SQL-native, cheap | Custom dashboards | | OpenCost / Kubecost | Pod-level allocation on K8s | Multi-tenant clusters | | CloudHealth, Vantage, Finout | Multi-cloud, anomaly alerts | Hybrid AWS/GCP/Azure |
For Kubernetes specifically, OpenCost (CNCF Sandbox) is now the de-facto standard to attribute pod and namespace costs back to teams — including idle capacity and shared overhead.
2. Optimize: rightsizing before commitments
A mistake we still see in 2025: companies buying 3-year Savings Plans on top of an over-provisioned fleet. You lock in waste for 36 months.
The correct order is rightsize first, commit second.
Rightsizing
- Compute: AWS Compute Optimizer, Azure Advisor, GCP Recommender. Target P95 CPU > 40% and memory > 50% on prod workloads. Anything chronically below 20% is a candidate for downsizing or Graviton/ARM migration (typically -20% on instance price for equivalent performance).
- Databases: don't trust averages. Look at peak connections, IOPS and buffer cache hit ratio. RDS gp3 instead of gp2 alone saves ~20% on storage with better baseline IOPS.
- Storage: lifecycle policies on S3/GCS/Blob are still the lowest-effort, highest-impact action. Moving cold logs to S3 Glacier Instant Retrieval or Azure Cool tier divides storage cost by 3–5×.
Commitments
Once baseline is stable, layer commitments:
- AWS Compute Savings Plans (1 year, no upfront) cover EC2, Fargate and Lambda — start here, ~27% discount with full flexibility.
- GCP Committed Use Discounts (CUDs), now available as flexible spend-based commitments.
- Azure Reserved Instances + Savings Plans for Compute.
Rule of thumb: commit 70–80% of your stable baseline, leave the rest on-demand or on Spot. Re-evaluate every quarter.
Spot / Preemptible
For stateless workloads, batch and CI runners, Spot remains the biggest single lever (up to -90%). Karpenter on EKS or GKE Autopilot Spot make this almost transparent — Karpenter consolidates nodes automatically and replaces interrupted instances within seconds.
3. Operate: governance that survives Mondays
Optimisation is a one-shot. Governance is what compounds.
Anomaly detection: enable AWS Cost Anomaly Detection, GCP Budget alerts with Pub/Sub triggers, or Azure Cost Management alerts. Route alerts to the team's Slack channel — not to a generic finops@ inbox no one reads.
Showback before chargeback: publish a weekly per-team cost dashboard before invoicing internally. Visibility alone reduces spend by ~10% in the first months (the "observer effect").
Unit economics: track cost per active user, cost per transaction, cost per ML inference. Absolute spend going up is fine if unit cost goes down. This is the metric to bring to the board.
FinOps as code: integrate Infracost in pull requests to surface cost diffs before merge:
# .github/workflows/infracost.yml
- uses: infracost/actions/setup@v3
- run: infracost breakdown --path=. --format=json --out-file=/tmp/infracost.json
- run: infracost comment github --path=/tmp/infracost.json --pull-request=${{ github.event.pull_request.number }}
A Terraform PR adding a db.r6g.4xlarge will then show +1,247 €/month directly in the review — engineers self-correct.
A 30-day starter checklist
- [ ] Enforce 3 mandatory tags via policy-as-code
- [ ] Activate Cost Explorer / BigQuery billing export / Cost Management exports
- [ ] Deploy OpenCost on every Kubernetes cluster
- [ ] Run rightsizing recommendations, action top 20 by savings
- [ ] Apply S3/GCS/Blob lifecycle rules on logs and backups
- [ ] Set anomaly alerts routed to engineering Slack channels
- [ ] Add Infracost to the Terraform CI pipeline
- [ ] Define unit cost KPIs with Product
Key takeaways
- Allocation is the foundation — without clean tags, every other FinOps action is guesswork.
- Rightsize before you commit: Savings Plans on bloated infrastructure lock in waste.
- Make cost visible inside engineering workflows (PR comments, Slack alerts, team dashboards), not just in monthly Finance reports.
- Track unit economics, not just total spend — it's the only metric that scales with the business.
- FinOps is a loop, not a project: Inform → Optimize → Operate, repeated every quarter.
Read also
- DevSecOpsMay 14, 2026
DevSecOps in 2025: A Practical Pipeline Blueprint
Shift-left security is a discipline, not a slogan. Here's how to wire SBOMs, SAST/DAST and secret scanning into CI without slowing your teams down.
Read article - Agents IA & automatisationMay 11, 2026
AI Agents in Production: MCP, Tool Use, and Orchestration
From autonomous agents to multi-agent orchestration with MCP and LangGraph — what actually works in enterprise settings, with patterns, pitfalls and code.
Read article - Kubernetes & Cloud NativeMay 7, 2026
Running Kubernetes at Scale: Beyond the Basics
Operators, GitOps, service mesh and zero-trust security: what actually matters when your K8s footprint crosses 50 clusters.
Read article