What's in this article
- CloudWatch Logs retention — the infinite bin
- Secrets Manager vs SSM Parameter Store
- ECR image lifecycle rules
- VPC Endpoints — Gateway (free) and Interface (not free)
- Public IPv4 costs — more places than you think
- S3 Lifecycle policies — versions, transitions, and zombie uploads
- EBS gp2 to gp3 — a free upgrade hiding in plain sight
Every item in this article is something AWS provides natively. You do not need a FinOps platform, a third-party scanner, or a consulting engagement to act on any of these. They are settings inside your AWS console that are either off by default, or set to a default that costs you more than necessary.
None of these are exotic. Most engineers have heard of all of them. The problem is not awareness — it is that nobody goes back and configures them after the initial setup. This article is the nudge to do that.
01CloudWatch Logs — the infinite bin
When you create a CloudWatch Log Group — whether manually, through a Lambda function, an ECS service, an API Gateway stage, or a VPC Flow Log — the default retention is Never expire. Logs accumulate forever.
The fix is simple: set a retention policy on every log group. Common choices are 7 days for noisy debug logs, 30 days for application logs, and 90 days for audit or compliance-adjacent logs. CloudWatch gives you options from 1 day to 10 years. Pick the shortest period that satisfies your debugging and compliance needs.
aws logs describe-log-groups --query 'logGroups[?retentionInDays==`null`]' to list every log group with no retention set.
The second place logs accumulate invisibly is Lambda function log groups. Every Lambda function gets its own log group created automatically on first invocation. In a microservices account with dozens of functions, this silently produces dozens of never-expiring log groups. Set a default retention policy at the account level using aws logs put-account-policy (available since late 2023) so new log groups inherit a retention period without manual action.
Secrets Manager vs SSM Parameter Store
AWS Secrets Manager and SSM Parameter Store both store sensitive values. They are not interchangeable — but they are also not equally priced, and many teams use Secrets Manager for everything without realising the cost difference.
$0.40/month per secret sounds negligible. But multiply it across environments. A single application storing 20 secrets in dev, staging, and production accounts is 60 secrets — $24/month. Add 10 more microservices and you are at $240/month, just for secret storage. That is before replication to a second region.
SecureString type uses AWS KMS encryption, has full CloudTrail audit logging, and integrates with IAM — the same baseline security, at no storage cost.
Use Secrets Manager when you actually need its features
Secrets Manager earns its cost when you need automatic rotation — particularly for RDS, Aurora, Redshift, and DocumentDB credentials. It manages the Lambda rotation function, handles version staging, and keeps credentials live during the rotation window. That capability is genuinely valuable and hard to replicate manually.
For everything else — database hostnames, feature flag values, API endpoint URLs, environment names, queue URLs, non-rotating API keys — SSM Parameter Store Standard is free, encrypted, audited, and fully functional.
One more cost trap inside Secrets Manager: Lambda functions that fetch a secret on every invocation. A function processing 1 million requests/month generates 1 million API calls at $0.05 per 10,000 — that is $5/month per secret, just in retrieval. Use the AWS Parameters and Secrets Lambda extension, which caches both SSM parameters and secrets in memory with a configurable TTL, reducing actual API calls to both services by 90–99%. Read this article to learn more.
Discount code for FinOpsX event and FinOps certifications — LEARNCLOUDX26 | LEARNCLOUD
ECR image lifecycle rules
Amazon ECR (Elastic Container Registry) charges $0.10/GB/month for image storage. That is not expensive per image — but container images accumulate fast. Every CI/CD pipeline push creates a new image. Without a lifecycle rule, your registry grows unboundedly.
ECR lifecycle policies let you define rules based on image count or image age. The two most useful patterns:
- Keep the last N tagged images: retain the 10 most recent images with a specific tag prefix (e.g.,
v*for version tags), delete older ones automatically. - Expire untagged images: delete any image that has no tag and is older than 1 day. Untagged images are almost always intermediate build artifacts — they serve no purpose after the tagged image is pushed.
A team running 20 repositories with active CI/CD can easily accumulate 500 GB+ of images within a year without lifecycle rules — $50/month in storage that provides zero value. With rules, the same repositories typically stabilise at under 50 GB.
VPC Endpoints — one is free, the other charges even when idle
If your workloads run in private subnets and talk to AWS services (S3, DynamoDB, ECR, CloudWatch, Secrets Manager, SSM, STS), that traffic goes somewhere. By default, it goes through your NAT Gateway. Understanding how VPC Endpoints work — and where they save money versus where they cost money — is one of the most underused configurations in AWS networking.
Gateway Endpoints — genuinely free, no reason not to use them
Gateway Endpoints exist for exactly two services: S3 and DynamoDB. They work by adding a route to your VPC route table that sends traffic for these services directly over the AWS private network, bypassing the NAT Gateway entirely. There is no hourly charge and no data processing fee. They are free.
If you have EC2, ECS, EKS, or Lambda in private subnets talking to S3 or DynamoDB — and most workloads do — you are paying $0.045/GB in NAT Gateway processing fees on that traffic right now. One team discovered a $650 surprise on their bill from S3 transfers alone, caused purely by the absence of a Gateway Endpoint. The fix takes five minutes.
Interface Endpoints — useful, but understand the cost model first
Interface Endpoints (powered by AWS PrivateLink) let your private subnet resources reach other AWS services — ECR, CloudWatch Logs, Secrets Manager, SSM, STS, KMS, and many more — without going through a NAT Gateway. They are the right tool when your NAT Gateway is processing large volumes of traffic to these services.
But Interface Endpoints are not free, and their cost model has a trap: you pay $0.01/hour per endpoint per Availability Zone, regardless of whether any traffic flows through it.
ECR (container pulls)
Requires two endpoints: ecr.api and ecr.dkr. Plus an S3 Gateway Endpoint for image layers. Total: 2 Interface + 1 Gateway.
CloudWatch Logs
One endpoint: logs. Every Lambda, ECS task, or EC2 instance sending logs needs this if you want to avoid NAT.
Secrets Manager / SSM
One endpoint each: secretsmanager and ssm. Often overlooked — but startup cost applies from the moment you create them.
STS / KMS
Needed if your workloads call AssumeRole or use KMS for encryption frequently. Each is a separate endpoint and a separate hourly charge.
A full set of Interface Endpoints for a typical workload — ECR (×2), CloudWatch Logs, Secrets Manager, SSM, STS — is 6 endpoints. At 3 AZs, that is roughly $130/month in fixed endpoint costs, whether the endpoints are busy or idle. This is worth it if you are paying $500+/month through NAT for these services. It is not worth it if your NAT traffic to these services is $40/month.
Public IPv4 costs — more places than you think
As of February 1, 2024, AWS charges $0.005 per hour for every public IPv4 address — attached to a running resource or sitting idle. That is $3.65/month per address. Before this change, in-use public IPs were free. Most teams acknowledged the change, nodded, and moved on. Many have not actually looked at how many public IPs they are running.
EC2 instances in default VPC
Instances launched in the default VPC get a public IP automatically. This is the default behaviour — most teams do not turn it off.
Load Balancers (ALB / NLB)
An internet-facing ALB or NLB in 3 AZs holds 3+ public IPs. That is $10.95+/month in IP charges alone, before bandwidth.
NAT Gateways
Each NAT Gateway holds one Elastic IP. 3 NAT Gateways (one per AZ) = 3 EIPs = $10.95/month, on top of the hourly NAT charge.
RDS instances
RDS instances with "Publicly accessible = Yes" hold a public IP. Multi-AZ RDS has a standby too. Both are charged.
EKS worker nodes
Nodes launched in public subnets or with public IP assignment enabled each hold a public IP. A cluster with 20 nodes = $73/month in IP charges.
Idle Elastic IPs
Elastic IPs not attached to a running resource cost the same $3.65/month. One stopped instance, one detached EIP — still billing.
Where you can avoid public IPs
Most production workloads do not need a public IP on every resource. The load balancer faces the internet. Everything behind it can be private.
- EC2 and EKS nodes: Place them in private subnets. They reach the internet through a NAT Gateway or not at all. No public IP, no per-IP charge.
- RDS: Set "Publicly accessible" to No. Your application connects from within the VPC. There is almost never a good reason for an RDS instance to have a public IP in production.
- Dev and test instances: Use Systems Manager Session Manager for shell access instead of SSH over a public IP. This removes the need for a public IP entirely on instances you only need to administer — no bastion, no exposed port 22.
- Elastic IPs on stopped instances: If the instance is stopped for more than a few days, either start it or release the EIP. An EIP sitting on a stopped instance is paying the same rate as one on a running instance.
Refer these videos to learn about IPv6
- AWS - IPv6 concepts | Dual Stack on AWS VPC | IPv6 comparison with IPv4
- AWS - VPC with IPv6 DEMO | Egress Only Internet Gateway | Easy Setup
S3 Lifecycle policies — versions, transitions, and zombie uploads
S3 looks cheap at $0.023/GB/month for Standard storage. It is — until you multiply it across years of data, enabled versioning without expiry, and incomplete multipart uploads that nobody cleaned up.
Versioning without an expiry rule doubles (or more) your storage
S3 Versioning is a good feature. Enabling it is usually the right call. But it only protects you if you also tell S3 what to do with old versions. Without a lifecycle rule, every object overwrite keeps the previous version. Every delete creates a delete marker. Both are stored at the full Standard rate. A bucket where objects change frequently can hold 10x the data it appears to hold when you look at the "current" objects.
Incomplete multipart uploads — invisible dead weight
S3 Multipart Upload lets large files be uploaded in chunks. If the upload is cancelled, fails, or the client crashes, the partial chunks remain in the bucket — stored at full S3 rates, not visible in the console as objects, not accessible, not deletable through the normal console view. The only way to find them is via the ListMultipartUploads API or the S3 Storage Lens dashboard.
The fix is a single lifecycle rule: AbortIncompleteMultipartUpload with a days value of 7 (or less). This tells S3 to automatically clean up any upload that has been incomplete for more than 7 days. No code required. No monitoring required. It is a policy, not a script.
Storage class transitions — understand the full cost before configuring
S3 Standard costs $0.023/GB. S3 Standard-IA (Infrequent Access) costs $0.0125/GB — about 45% cheaper — for objects you access rarely. S3 Glacier Instant Retrieval costs $0.004/GB for cold data you retrieve once a month or less.
Transitioning objects via lifecycle rules is not free. AWS charges a per-1,000-objects fee for each transition — $0.01 per 1,000 objects to Standard-IA or Glacier Instant Retrieval. For a bucket with millions of small files, this transition cost alone can cancel out months of storage savings. The maths only work in your favour when objects are large enough that the per-GB storage saving outweighs the per-object transition cost.
There are also minimum storage duration charges to be aware of. Standard-IA has a 30-day minimum — if an object is deleted or transitioned out before 30 days, you are still charged for the full 30 days. Glacier Instant Retrieval has a 90-day minimum. Transitioning short-lived objects into these tiers will cost you more, not less.
Good candidates for transitions are large, long-lived, infrequently accessed objects — compliance archives, database backups you retain for months, CloudTrail logs beyond 90 days, ML training datasets you are done iterating on. Poor candidates are small files, objects with unpredictable access patterns, or anything retained for less than the minimum storage duration of the target tier.
07EBS gp2 to gp3 — a free upgrade hiding in plain sight
When EC2 launched, the default general-purpose EBS volume type was gp2. In late 2020, AWS introduced gp3 as its successor. gp3 is cheaper, has better baseline performance, and allows performance to be configured independently of volume size. Despite being available for several years, a large number of volumes in production AWS accounts are still gp2 — because the default was gp2 when they were created and nobody changed them.
What makes gp3 different
gp2 ties performance to volume size through a credit-based burst model. A 100 GB gp2 volume has a baseline of 300 IOPS, with the ability to burst to 3,000 IOPS using accumulated credits. If credits run out under sustained load, performance drops back to baseline. A 1 TB volume gets 3,000 IOPS baseline. So teams would overprovision volume size just to get the IOPS they needed — paying for storage capacity they did not use.
gp3 decouples this entirely. Every gp3 volume, regardless of size, gets 3,000 IOPS and 125 MB/s throughput included in the base price. No burst model. No credit depletion. Consistent performance. If you need more — up to 16,000 IOPS — you pay an additional IOPS fee. But for the majority of workloads running at 3,000 IOPS or below, gp3 delivers the same performance as a much larger gp2 volume, at 20% less cost per GB.
How to migrate — with zero downtime
Converting a gp2 volume to gp3 is an in-place operation through the AWS console or CLI. There is no snapshot required, no instance stop, no data migration. AWS handles the conversion live while the volume remains attached and fully accessible.
- In the console: EC2 → Volumes → select volume → Actions → Modify volume → change type to gp3 → confirm.
- Via CLI:
aws ec2 modify-volume --volume-id vol-xxxx --volume-type gp3 - The modification takes effect immediately for pricing. Performance specs take effect as the conversion progresses (usually minutes to hours depending on volume size).
aws ec2 describe-volumes --filters Name=volume-type,Values=gp2 --query 'Volumes[*].[VolumeId,Size,State]' --output table — run this across each region to see what you have. In large accounts, the savings from a bulk gp2-to-gp3 migration are often in the hundreds of dollars per month.
The common thread across all seven
None of these require a new tool, a new service, or new architecture. They are configurations inside services you are already running and already paying for. The default settings favour simplicity and maximum capability at setup time — which is often the right tradeoff when you are moving fast. But defaults are not permanent. Going back and tuning them is not technical debt. It is just part of operating in the cloud.