AWS Databases & Storage

AWS RDS & Aurora: A Deep Dive into Managed Relational Databases

RDS and Aurora sit under the same service umbrella, but they are fundamentally different beasts — in architecture, pricing, performance, and the problems they solve. This article walks through all of it: engine types, CLI groupings, HA options, cross-region replicas, backup and restore mechanics, architecture trade-offs, security controls, and the tricks that keep your bill sane.

RDS Aurora MySQL PostgreSQL Multi-AZ HA & DR Cost Optimisation

What's in this article

  1. The RDS landscape — one umbrella, two worlds
  2. CLI and API groupings by engine type
  3. Common features across all engines
  4. Where engines diverge — pricing, licensing, responsibility
  5. Multi-AZ, read replicas, and cross-region replicas
  6. How cross-region replicas enable region failover
  7. Backup mechanics and restore behaviour
  8. Architecture difference: RDS vs Aurora
  9. Key tricks to save cost
  10. Security — network, encryption, IAM, and audit
01

The RDS landscape — one umbrella, two worlds

When AWS says "Amazon RDS," it refers to two distinct product lines that are billed and architected very differently but share a common console tab and a partially overlapping CLI namespace.

Classic RDS takes a standard community or commercial database engine, installs it on a managed EC2 instance, and wraps it in automation for patching, backups, and failover. You are still essentially running a single database process on a single machine. AWS handles the undifferentiated heavy lifting — OS patching, minor version upgrades, storage provisioning — but the fundamental server-on-disk model is unchanged.

Aurora is AWS's own cloud-native relational database. It retains wire-protocol compatibility with MySQL and PostgreSQL, so your application drivers and queries work without modification, but the engine beneath has been rebuilt from scratch. Storage is a distributed, self-healing, six-way replicated volume that is completely decoupled from the compute instances that read and write to it.

🐬
RDS MySQL

Community MySQL 8.0 / 5.7. Familiar, portable, BYOL not applicable — community licence.

🐘
RDS PostgreSQL

Community PostgreSQL 16 / 15 / 14 / 13. Fully open-source, no licence cost.

🏢
RDS Oracle

Oracle EE / SE2. Pay-per-use licence included, or Bring Your Own Licence (BYOL).

🪟
RDS SQL Server

Microsoft SQL Server EE / SE / Web / Express. Licence-included or BYOL.

🐋
RDS MariaDB

Community MariaDB 10.x. Drop-in MySQL fork with no extra licence cost.

🗄️
RDS IBM Db2

IBM Db2 for LUW (Linux, Unix, Windows). Enterprise database with licence-included or BYOL options.

Aurora MySQL

MySQL-compatible, AWS-built engine. Up to 5× faster than community MySQL.

Aurora PostgreSQL

PostgreSQL-compatible, AWS-built engine. Up to 3× faster than community PostgreSQL.

☁️
Aurora Serverless v2

Aurora MySQL or PostgreSQL engine that auto-scales compute in fine-grained ACU increments.

The key mental model: all Aurora variants are RDS in the console, but in the API and CLI they split into two distinct resource types — db-instance for classic RDS and db-cluster for Aurora. This split has practical consequences for every CLI command you run.

02

CLI and API groupings by engine type

The AWS CLI separates classic RDS instances and Aurora clusters into different commands even though both live under the aws rds namespace. Understanding which commands apply to which resource type saves a lot of confusion.

Classic RDS — working with DB instances

Classic RDS has one primary resource: the DB instance. Every CLI operation targets a specific --db-instance-identifier.

# Create an RDS MySQL instance
aws rds create-db-instance \
  --db-instance-identifier prod-mysql \
  --db-instance-class db.t3.medium \
  --engine mysql \
  --engine-version 8.0.36 \
  --master-username admin \
  --master-user-password S3cr3tP@ss \
  --allocated-storage 100 \
  --storage-type gp3 \
  --multi-az

# Create a read replica of a classic RDS instance
aws rds create-db-instance-read-replica \
  --db-instance-identifier prod-mysql-replica \
  --source-db-instance-identifier prod-mysql

# Promote a read replica to standalone (for failover)
aws rds promote-read-replica \
  --db-instance-identifier prod-mysql-replica

# Modify instance class (causes a brief restart in single-AZ)
aws rds modify-db-instance \
  --db-instance-identifier prod-mysql \
  --db-instance-class db.r6g.large \
  --apply-immediately

# Reboot a specific instance
aws rds reboot-db-instance \
  --db-instance-identifier prod-mysql

Aurora — working with clusters and instances

Aurora has two resource layers: the DB cluster (the logical database, its storage, and its endpoint) and one or more DB instances attached to the cluster (the compute nodes). You create the cluster first, then add instances. Most high-level operations target the cluster.

# Step 1 — Create the Aurora cluster (storage + logical database)
aws rds create-db-cluster \
  --db-cluster-identifier prod-aurora \
  --engine aurora-mysql \
  --engine-version 8.0.mysql_aurora.3.04.0 \
  --master-username admin \
  --master-user-password S3cr3tP@ss \
  --db-subnet-group-name my-subnet-group \
  --vpc-security-group-ids sg-0abc123

# Step 2 — Add the writer instance to the cluster
aws rds create-db-instance \
  --db-instance-identifier prod-aurora-writer \
  --db-instance-class db.r6g.large \
  --engine aurora-mysql \
  --db-cluster-identifier prod-aurora

# Add a reader instance (same cluster)
aws rds create-db-instance \
  --db-instance-identifier prod-aurora-reader-1 \
  --db-instance-class db.r6g.large \
  --engine aurora-mysql \
  --db-cluster-identifier prod-aurora

# Add a cross-region read replica CLUSTER (Aurora Global Database approach)
aws rds create-db-cluster \
  --db-cluster-identifier prod-aurora-replica-ap \
  --engine aurora-mysql \
  --replication-source-identifier arn:aws:rds:us-east-1:123456789:cluster:prod-aurora \
  --region ap-southeast-2

# Failover within a cluster (promote a reader)
aws rds failover-db-cluster \
  --db-cluster-identifier prod-aurora \
  --target-db-instance-identifier prod-aurora-reader-1

# Modify the cluster (e.g. enable deletion protection)
aws rds modify-db-cluster \
  --db-cluster-identifier prod-aurora \
  --deletion-protection
Rule of thumb: If you are touching storage, backups, cluster endpoints, replication, or engine version — use *-db-cluster commands. If you are touching compute sizing or instance-level parameters — use *-db-instance commands (even for Aurora instances).

Aurora Serverless v2

Serverless v2 is still a cluster with instances — but the instance class is db.serverless and you set ACU min/max on the cluster instead of choosing a fixed instance size.

# Create an Aurora Serverless v2 cluster
aws rds create-db-cluster \
  --db-cluster-identifier prod-serverless \
  --engine aurora-postgresql \
  --engine-version 15.4 \
  --serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=64 \
  --master-username admin \
  --master-user-password S3cr3tP@ss

# Attach a Serverless v2 instance to the cluster
aws rds create-db-instance \
  --db-instance-identifier prod-serverless-instance \
  --db-instance-class db.serverless \
  --engine aurora-postgresql \
  --db-cluster-identifier prod-serverless

You can have a single Amazon Aurora cluster that contains both provisioned (server type) and serverless (v2) instances. This allows you to run a mixed environment where some workloads use fixed-size instances while others benefit from the auto-scaling capabilities of serverless v2, all within the same cluster and sharing the same storage layer.

03

Common features across all engines

Regardless of which engine you choose, the following capabilities are part of the managed RDS service and work in the same way.

Automated backups

Every RDS and Aurora instance takes automated daily snapshots during a configurable backup window and retains transaction logs continuously, enabling point-in-time recovery (PITR) to any second within the retention window, up to the LatestRestorableTime (typically, the last five minutes). Backup retention period for automated snapshots can be upto to 35 days. Backup storage up to 100% of your provisioned storage is free.

Manual snapshots

You can take manual snapshots at any time. Unlike automated backups, manual snapshots persist indefinitely until you explicitly delete them. They survive instance deletion, which is useful for archiving end-of-life databases.

Parameter groups and option groups

Engine configuration is managed through parameter groups. Changing a dynamic parameter takes effect immediately; static parameters require a reboot. Option groups (applicable mainly to Oracle and SQL Server) let you attach engine-specific features such as Oracle APEX or SQL Server TDE.

Maintenance windows

AWS applies OS and minor engine patches within a weekly maintenance window you define. For Multi-AZ deployments the patch is applied to the standby first, a failover happens, then the old primary is patched — minimising downtime to a brief failover event (typically under 60 seconds).

Enhanced monitoring and Performance Insights

Enhanced Monitoring collects OS-level metrics (CPU, memory, disk) at up to 1-second granularity from an agent inside the instance. Performance Insights provides a query-level load view and is free for 7 days of retention, then charged for longer retention. This feature is very useful tool for pinpointing exactly why a database is slow by visualizing DB Load.

Encryption at rest

All engines support AES-256 encryption using KMS keys. Encryption must be enabled at creation time and cannot be added to an existing unencrypted instance (you must restore a snapshot to a new encrypted instance).

VPC placement and security groups

All instances live inside a VPC, in a DB subnet group you define. Access is controlled by security groups — there is no public routing unless you explicitly enable PubliclyAccessible: true (which you almost never should).

04

Where engines diverge — pricing, licensing, responsibility

Feature RDS MySQL / PostgreSQL / MariaDB RDS Oracle RDS SQL Server Aurora MySQL / PostgreSQL
Licence model Community (no licence cost) Licence-included or BYOL Licence-included or BYOL AWS proprietary (MySQL/PG compatible) — no separate licence
Pricing basis Instance-hour + storage (gp2/gp3/io1) Instance-hour + licence fee + storage Instance-hour + licence fee + storage Instance-hour + storage I/O (per million requests) + storage GB
Licence cost example (r6g.2xl) ~$0.48/hr compute only ~$1.20/hr (LI Oracle SE2) ~$1.50/hr (LI SQL Server SE) ~$0.58/hr compute only (storage billed separately)
BYOL available No Yes Yes (S/w Assurance) No
AWS responsibility OS, engine patches, hardware, HA failover OS & patching only — Oracle licence management is yours OS & patching only — SQL Server licence compliance is yours if BYOL OS, engine, storage durability, cluster management
Customer responsibility Schema, queries, parameters, backups policy, security groups Schema + Oracle licence compliance + RAC not supported Schema + SQL Agent jobs + SSRS/SSAS/SSIS not included Schema, queries, cluster topology, ACU sizing (serverless)
Multi-AZ HA Yes — synchronous standby (same engine) Yes — but not Real Application Clusters Yes — uses SQL Server Mirroring / Always On Yes — cluster-native, any reader can be promoted
Read replicas (in-region) Yes (up to 5) No No Yes (up to 15)
Cross-region replicas Yes (up to 5, async) No No Yes — Aurora Global Database (sub-1s lag)
Max storage 64 TiB (gp3/io1) 64 TiB 16 TiB 128 TiB (auto-grows in 10 GiB increments)
Storage auto-scaling Yes (opt-in) Yes Yes Always on — automatic
Oracle and SQL Server licence-included pricing can be 2–4× the cost of an equivalent open-source engine. If you are migrating to AWS for the first time and are not locked into Oracle features, this is the moment to evaluate re-platforming onto PostgreSQL — AWS DMS and Schema Conversion Tool (SCT) make it increasingly practical.
LearnCloud FinOps — discount code for FinOpsX event and FinOps certifications

Discount code for FinOpsX event and FinOps certifications — LEARNCLOUDX26 | LEARNCLOUD

05

Multi-AZ, read replicas, and cross-region replicas

These three mechanisms are often confused because they all involve "copies" of your data. They solve completely different problems.

Multi-AZ — availability, not scale

Multi-AZ creates a synchronous standby replica in a different Availability Zone within the same region. It is not readable — the standby is invisible to your application and exists purely as a failover target. AWS handles the failover automatically in roughly 60–120 seconds by flipping the DNS endpoint to point at the standby.

For classic RDS, Multi-AZ doubles your instance cost (you pay for two instances). For Aurora, all reader instances in the cluster serve as Multi-AZ failover targets at no additional HA tax — you already pay for them as read replicas, and they double as failover candidates.

Classic RDS Multi-AZ
Primary instance (AZ-a) — readable/writable
⬇ synchronous replication
Standby instance (AZ-b) — NOT readable

DNS flips on failover. App reconnects to same endpoint. ~60–120s RTO.

Aurora Multi-AZ
Writer instance (AZ-a)
⬇ shared storage (6-way replicated)
Reader 1 (AZ-b) — readable
Reader 2 (AZ-c) — readable + failover target

Any reader can be promoted. RTO typically <30s. Readers serve live traffic.

Read replicas — scale, with an HA bonus

Read replicas are asynchronous copies of your database that can serve read traffic. They reduce load on the primary for reporting, analytics, or read-heavy workloads. The replication lag is typically milliseconds but can grow under write-heavy load or cross-region latency.

Classic RDS: up to 5 read replicas per source instance. Each replica is a full EC2-backed instance billed at the same rate as the primary. Supported for MySQL, PostgreSQL, and MariaDB — not for Oracle or SQL Server.

Aurora: up to 15 reader instances per cluster. Because they share the same underlying storage volume as the writer, there is no data copying overhead — readers just point at different positions in the distributed log. Replication lag is typically in single-digit milliseconds.

Cross-region read replicas

Cross-region read replicas extend asynchronous replication to another AWS region. This serves two purposes: local reads for users in a distant geography, and disaster recovery if the primary region becomes unavailable.

For classic RDS MySQL and PostgreSQL, cross-region replicas are created the same way as in-region replicas — you just specify a different region in the API call. Data transfer costs apply.

For Aurora, the preferred mechanism is Aurora Global Database, which provides sub-second replication lag across regions by using a dedicated storage-level replication layer rather than the binlog.

06

How cross-region replicas enable region failover

A cross-region replica is your insurance policy against a regional failure — a full AWS region going down or becoming impaired to the point where you need to shift traffic. Here is how that plays out in practice.

Classic RDS — promote the replica

A cross-region read replica for classic RDS is a standalone instance in the secondary region receiving asynchronous binlog replication from the primary region. When a failover decision is made:

  1. Stop writes to the primary (or accept that some data in-flight may be lost).
  2. Allow replication to catch up if the primary is still accessible.
  3. Promote the replica to a standalone instance — it is now writable. You have to do this step!
  4. Update your application's database endpoint to the new instance in the secondary region.
  5. Rebuild a new replica back to the original region once it recovers.
RPO is not zero. Asynchronous replication means that transactions acknowledged by the primary but not yet replicated will be lost. For MySQL with sync_binlog=1 and innodb_flush_log_at_trx_commit=1 you reduce this gap, but it never reaches zero. Size your acceptable RPO before choosing CRR as your DR strategy.

Aurora Global Database — managed failover

Aurora Global Database is designed specifically for this scenario. Replication happens at the storage layer with typical lag under 1 second. The secondary cluster has its own set of reader instances that can serve local reads in the secondary region before any failover occurs.

When you initiate (remember, you have to do it) a region failover ("managed planned failover" or "unplanned detach and promote"):

  1. The secondary cluster is detached from the global database and promoted to a full read-write cluster.
  2. The promotion takes 1–2 minutes and involves the secondary cluster catching up on the replication buffer.
  3. A new cluster endpoint is available in the secondary region. Point your application there.
  4. When the primary region recovers, you can re-add it as a secondary region to restore the global database topology.
# Initiate a managed failover for Aurora Global Database
aws rds failover-global-cluster \
  --global-cluster-identifier my-global-db \
  --target-db-cluster-identifier arn:aws:rds:ap-southeast-2:123456:cluster:prod-aurora-replica-ap

# Remove the old primary from the global cluster (after regional recovery)
aws rds remove-from-global-cluster \
  --global-cluster-identifier my-global-db \
  --db-cluster-identifier arn:aws:rds:us-east-1:123456:cluster:prod-aurora
RTO vs RPO summary: Aurora Global Database gives you RTO ~1–2 minutes and RPO typically <1 second. Classic RDS cross-region replicas give you RTO 5–15 minutes (manual steps) and RPO that depends on replication lag at the moment of failure — often 30–60 seconds under normal conditions, worse under heavy load.
07

Backup mechanics and restore behaviour

How automated backups work

RDS takes a storage-level snapshot during a daily backup window (you configure the window, AWS executes it). For the rest of the day, transaction logs are streamed to S3 in near-real time. Together these enable PITR — you can restore to any point within your retention window (up to 35 days).

For Aurora, the backup is continuous and automatic. Aurora never takes a "snapshot" in the traditional sense — the distributed storage layer keeps a log of every storage I/O, and any point in the retention window is always restorable. This is one of the architectural advantages of Aurora: backups impose essentially zero performance impact on the database.

For classic RDS, the daily snapshot causes a brief I/O suspension on single-AZ instances (seconds to minutes, depending on database size). On Multi-AZ instances the snapshot is taken from the standby, so the primary is unaffected.

Manual snapshots

Manual snapshots are initiated by you, stored in S3 (in the same region by default), and retained indefinitely. You are charged for storage beyond the free tier (100% of provisioned storage). Snapshots can be copied to another region for DR purposes.

# Take a manual snapshot
aws rds create-db-snapshot \
  --db-instance-identifier prod-mysql \
  --db-snapshot-identifier prod-mysql-before-migration

# Copy a snapshot to another region
aws rds copy-db-snapshot \
  --source-db-snapshot-identifier arn:aws:rds:us-east-1:123456:snapshot:prod-mysql-before-migration \
  --target-db-snapshot-identifier prod-mysql-dr-copy \
  --region ap-southeast-2 \
  --kms-key-id arn:aws:kms:ap-southeast-2:123456:key/mrk-xxxxx

# For Aurora — manual cluster snapshot
aws rds create-db-cluster-snapshot \
  --db-cluster-identifier prod-aurora \
  --db-cluster-snapshot-identifier prod-aurora-pre-deploy

Backup charges

Free tier: Automated backup storage equal to 100% of your total provisioned database storage is free. A 500 GiB RDS instance gets 500 GiB of automated backup storage at no charge.

Paid: Storage beyond 100% of provisioned size, and all manual snapshot storage, is charged at roughly $0.095/GB-month (varies by region). A retention window of 35 days on a busy database can easily exceed the free tier and become a meaningful cost line.

Restore behaviour — always a new instance

This is a frequently misunderstood behaviour: restoring a snapshot or performing PITR always creates a brand-new DB instance or cluster. AWS does not restore in-place over an existing instance.

This is actually a safety feature. Your production database continues to run while the restore creates a parallel instance. You can verify the restored instance, run queries, validate data integrity, then either cut over by pointing your application to the new endpoint, or delete it if it was only for forensics.

# Restore a snapshot to a new RDS instance
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier prod-mysql-restored \
  --db-snapshot-identifier prod-mysql-before-migration \
  --db-instance-class db.t3.medium

# Point-in-time restore — new instance at a specific time
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier prod-mysql \
  --target-db-instance-identifier prod-mysql-pitr \
  --restore-time 2026-04-26T14:30:00Z

# Restore Aurora cluster snapshot to new cluster
aws rds restore-db-cluster-from-snapshot \
  --db-cluster-identifier prod-aurora-restored \
  --snapshot-identifier prod-aurora-pre-deploy \
  --engine aurora-mysql

# After cluster restore, attach a new instance to it
aws rds create-db-instance \
  --db-instance-identifier prod-aurora-restored-writer \
  --db-instance-class db.r6g.large \
  --engine aurora-mysql \
  --db-cluster-identifier prod-aurora-restored
Tip: After restoring an Aurora cluster snapshot, you must attach at least one DB instance before the cluster becomes accessible. This is a common gotcha — the cluster endpoint exists but returns a connection error until an instance is present. When you "Restore Snapshot", AWS recreates the Aurora Cluster — which is essentially just the virtual storage volume where your data lives. However, storage by itself cannot process SQL queries or handle network connections. You need to create and attach a DB instance to that cluster to provide the compute layer that interacts with the storage. Once you have an instance attached, the cluster becomes fully operational and you can connect to it using the cluster endpoint.
08

Architecture difference: RDS vs Aurora

The performance and availability characteristics of Aurora are a direct consequence of its storage architecture, which is radically different from classic RDS.

Classic RDS — single-server model with hot standby

Classic RDS runs a standard database server process on an EBS-backed EC2 instance. Writes go to the local EBS volume. EBS itself replicates within an AZ transparently, but the database sees a single block device. Multi-AZ adds a synchronous replica by replicating the EBS writes to a second instance in another AZ via a dedicated AWS-managed replication channel.

Classic RDS storage model

Application
⬇ TCP / SQL protocol
RDS Instance (EC2 + database engine)
⬇ EBS writes (block I/O)
EBS Volume — gp3 or io1 (per-AZ)
⬇ synchronous mirror (Multi-AZ only)
Standby EBS Volume (different AZ)

The consequence of this model: every write is sequential — it must be acknowledged by the local EBS volume (and by the standby EBS volume for Multi-AZ) before the database can move on. Read replicas receive binlog events and must replay them, which means there is always replication lag proportional to write throughput.

Aurora — shared distributed storage

Aurora separates compute entirely from storage. All compute instances (writers and readers) attach to the same shared distributed storage volume. That volume spans 3 AZs and maintains 6 copies of every 10 GiB storage segment (2 copies per AZ). Writes are considered durable once 4 of the 6 copies acknowledge the write — AWS calls this the "quorum write" model.

Aurora distributed storage model

Application (Writer endpoint)
Writer Instance (compute only — no local storage)
⬇ redo log records only (4/6 quorum write)
Aurora Distributed Storage Volume
6 copies across 3 AZs · self-healing · auto-grows
⬆ read from shared volume (no replication lag)
Reader Instances (compute only — share same storage)
Application (Reader endpoint)

This architecture produces several important differences compared to classic RDS:

Why Aurora is faster for writes

Classic RDS sends a full page write to EBS (16 KiB blocks). Aurora only sends the redo log record — the minimal description of the change — to the storage layer. The storage nodes apply the redo log themselves. This reduces write amplification dramatically and lowers write latency.

Why Aurora readers have near-zero lag

Aurora readers share the same storage volume as the writer. There is no data to replicate. The reader's lag is just the time it takes for it to receive the in-memory page cache invalidation notification from the writer — typically single-digit milliseconds even under high write load, compared to seconds or more for classic RDS read replicas under heavy load.

Why Aurora failover is faster

In classic RDS Multi-AZ, failover relies on synchronous physical replication between two separate EBS volumes. During a failover, the standby must perform crash recovery to reconcile its state with the primary's logs before it can accept writes, typically resulting in a 60–120 second downtime.

In Amazon Aurora, all instances share a single, distributed storage volume. Because a Reader is already "attached" to the data, promotion doesn't require data syncing or volume remapping; the Reader simply transitions to a Writer role. This allows Aurora to complete failovers much faster—typically in under 30 seconds, and often in less than 15 seconds.

Why Aurora auto-scales storage without performance impact

Classic RDS storage expansion involves an online modification of the EBS volume — a background process that can take hours on large volumes and may throttle I/O. Aurora's distributed volume grows in 10 GiB increments automatically, invisibly, and with no performance impact.

Write latency
Aurora ~35% lower
Reader replica lag
Aurora <10ms vs RDS 1–10s
Failover RTO
Aurora <30s vs RDS 60–120s
09

Key tricks to save cost

1. Reserved Instances (1-year or 3-year)

RDS Reserved Instances provide 40–60% savings over On-Demand for instances you plan to run continuously. The commitment is per instance class and engine. For Aurora, reservations apply to the instance class — not the storage, which is always pay-per-use. If you have a stable baseline workload, reservations are the single biggest lever.

2. Aurora Serverless v2 for variable workloads

If your database has predictable peaks and quiet periods (nightly batch, business-hours-only SaaS), Aurora Serverless v2 scales in 0.5 ACU increments. At minimum scale (0.5 ACU) it costs roughly $0.06/hr — much less than the cheapest provisioned instance. The catch: cold-start latency is real if you scale to near zero, and minimum ACU 0.5 still keeps some memory warm.

ACU cost example: 1 ACU ≈ 2 GiB RAM. At $0.12/ACU-hour (us-east-1), a cluster idling at 2 ACU costs $0.24/hr ($175/mo). The same workload on db.t3.medium On-Demand (2 vCPU, 4 GiB) costs $0.068/hr ($50/mo) — so Serverless v2 is only cheaper if you genuinely scale down significantly for long periods. Do the maths for your actual traffic shape.

3. Stop dev/test instances outside business hours

RDS instances can be stopped for up to 7 days at a time. A stopped instance pays for storage and automated backup storage but not for compute. A db.r6g.large stopped overnight and on weekends (≈128 hrs per week off vs 40 hrs on) saves over 75% of the compute cost. Automate this with EventBridge Scheduler calling aws rds stop-db-instance.

Gotcha: AWS automatically restarts stopped instances after 7 days to apply maintenance. If you want longer-term stop, you need to re-stop it each week (or script the restart + re-stop cycle).

4. Rightsize before you reserve

Performance Insights and Enhanced Monitoring will tell you if your instance is consistently at 20% CPU. Downsize first, then reserve. An r6g.xlarge reserved at 1-year No Upfront is roughly $220/month. An r6g.large reserved is roughly $110/month. Use the minimum class that keeps CPU below 60–70% at peak.

5. Switch from gp2 to gp3 storage

If you created your RDS instance before gp3 was available, you may still be on gp2. gp3 is 20% cheaper per GB and provides 3,000 IOPS and 125 MB/s throughput as the baseline free of charge (vs gp2 where baseline IOPS scale with storage size). Switching is a live storage modification with no downtime.

# Migrate storage type to gp3 (live, no downtime)
aws rds modify-db-instance \
  --db-instance-identifier prod-mysql \
  --storage-type gp3 \
  --iops 3000 \
  --storage-throughput 125 \
  --apply-immediately

6. Reduce backup retention for non-critical databases

Automated backup retention can be set from 1 to 35 days. Every additional day of retention beyond the free 100% tier costs ~$0.095/GB-month. For dev/test databases with a small dataset this is negligible, but for a 2 TiB production database with 35-day retention and moderate change rate, the backup storage bill can easily exceed $200/month. Set retention to the minimum your RPO requires.

7. Aurora I/O-Optimised for high-throughput workloads

Aurora standard pricing charges per million storage I/O operations. For write-heavy workloads (high transactions, bulk inserts), I/O costs can dominate the bill. Aurora I/O-Optimised (a cluster configuration switch) eliminates per-I/O charges in exchange for a ~25% higher storage GB rate. If your I/O cost exceeds ~25% of your storage cost, switching pays off. AWS provides a cost estimator for this in the console.

8. Delete idle read replicas

Each Aurora reader instance or RDS read replica is a full billable instance. If you created replicas for a load test last quarter and forgot about them, they are running and billing. Review your instance list regularly.

10

Security — network, encryption, IAM, and audit

RDS and Aurora have a layered security model. A breach typically involves multiple layers being misconfigured simultaneously — understanding each layer is what prevents that.

🔒
VPC & Subnets

DB instances live in a DB subnet group — private subnets in your VPC. Never place RDS in a public subnet unless strictly required (public-facing read replicas for a CDN, etc.).

🛡️
Security Groups

Act as instance-level firewalls. Allow inbound on port 3306 (MySQL), 5432 (PostgreSQL), etc. only from specific app tier security groups — never from 0.0.0.0/0.

🔑
KMS Encryption at Rest

All data, backups, read replicas, and snapshots inherit the encryption key of the source. Use customer-managed keys (CMK) for full control over key rotation and access policy.

🔐
TLS in Transit

RDS supports TLS for all engines. Enforce it via parameter group (require_secure_transport=1 for MySQL) or connection string. Certificates rotate automatically; use RDS root CA bundles.

👤
IAM Database Auth

MySQL and PostgreSQL (RDS and Aurora) support IAM authentication. The app obtains a short-lived token via the AWS SDK instead of a long-lived password. No password rotation needed.

🗝️
Secrets Manager

Store and rotate DB master credentials automatically via Secrets Manager. The rotation Lambda updates the password in Secrets Manager and on the database. Zero downtime rotation.

📋
Audit Logging

Enable CloudWatch Logs export for error, slow query, general, and audit logs. For compliance workloads, Aurora Audit enables per-table or per-user auditing streamed to CloudWatch.

🌐
Network Isolation

Use VPC peering or PrivateLink if accessing RDS from another VPC or account. Never expose RDS over public internet — use a bastion host or Systems Manager Session Manager instead.

IAM database authentication in practice

IAM auth eliminates long-lived database passwords from your application configuration. Here is the pattern:

# 1. Enable IAM auth on the instance
aws rds modify-db-instance \
  --db-instance-identifier prod-mysql \
  --enable-iam-database-authentication \
  --apply-immediately

# 2. Create a database user mapped to IAM (run inside MySQL)
# CREATE USER 'app_user'@'%' IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS';

# 3. Generate an auth token (valid 15 minutes) — done in application code
aws rds generate-db-auth-token \
  --hostname prod-mysql.xxxx.us-east-1.rds.amazonaws.com \
  --port 3306 \
  --region us-east-1 \
  --username app_user
Tip: Pair IAM auth with an EC2 instance profile or ECS task role. The application never has a static password. The IAM policy attached to the role grants rds-db:connect on the specific database resource ARN. Rotation is automatic via IAM token expiry — no Lambda rotation function needed.

Encryption key considerations for cross-region copies

When you copy a snapshot to another region, you must re-encrypt it with a KMS key in the destination region. AWS KMS keys are regional. Use multi-region KMS keys (MRKs) if you want the same key material in both regions — this simplifies the copy process and ensures that the same key policy governs both copies.

Deletion protection

Always enable deletion protection on production databases. It prevents accidental delete-db-instance or delete-db-cluster calls — even from root. You must first disable deletion protection via a modify call before a delete succeeds. Combined with the 1-week final snapshot requirement, this gives you two checkpoints against accidental data loss.

# Enable deletion protection
aws rds modify-db-instance \
  --db-instance-identifier prod-mysql \
  --deletion-protection \
  --apply-immediately

# Same for Aurora cluster
aws rds modify-db-cluster \
  --db-cluster-identifier prod-aurora \
  --deletion-protection

Quick reference — which resource type to use when:

Use Aurora when you need: sub-30s failover, <10ms read replica lag, >5 readers, 128 TiB storage, Global Database for DR, or Serverless v2 autoscaling.

Use Classic RDS PostgreSQL/MySQL when you need: strict open-source portability, the lowest entry price for small workloads, or Aurora's I/O cost model doesn't suit your write pattern.

Use Classic RDS Oracle/SQL Server when you have: existing enterprise licences (BYOL) or applications with hard Oracle/SQL Server dependencies that cannot be re-platformed yet.

I hope you found this useful, please share it!

✓ Link copied to clipboard
Mayank Pandey

About the Author

Mayank Pandey

AWS Community Hero and Cloud Architect with 15+ years of experience. AWS Solutions Architect Professional, FinOps Practitioner, and AWS Authorized Instructor. Creator of the KnowledgeIndia YouTube channel (80,000+ subscribers). Based in Melbourne, Australia.