Amazon Aurora Cheat Sheet

A fully managed relational database engine that’s compatible with MySQL and PostgreSQL.
With some workloads, Aurora can deliver up to five times the throughput of MySQL and up to three times the throughput of PostgreSQL.
Aurora includes a high-performance storage subsystem. The underlying storage grows automatically as needed, up to 128 TiB. The minimum storage is 10GB.
Aurora will keep your database up-to-date with the latest patches.
Aurora supports quick, efficient cloning operations.
- You can share your Amazon Aurora DB clusters with other AWS accounts for quick and efficient database cloning.

Aurora is fault-tolerant and self-healing.

DB Clusters

- An Aurora DB cluster consists of one or more DB instances and a cluster volume that manages the data for those DB instances.
- An Aurora cluster volume is a virtual database storage volume that spans multiple AZs, with each AZ having a copy of the DB cluster data.
- Cluster Types:
  - Primary DB instance – Supports read and write operations, and performs all of the data modifications to the cluster volume. Each Aurora DB cluster has one primary DB instance.
  - Aurora Replica – Connects to the same storage volume as the primary DB instance and supports only read operations. Each Aurora DB cluster can have up to 15 Aurora Replicas in addition to the primary DB instance. Aurora automatically fails over to an Aurora Replica in case the primary DB instance becomes unavailable. You can specify the failover priority for Aurora Replicas. Aurora Replicas can also offload read workloads from the primary DB instance.

Aurora Endpoints

Cluster endpoint – connects to the current primary DB instance for a DB cluster. This endpoint is the only one that can perform write operations. Each Aurora DB cluster has one cluster endpoint and one primary DB instance.
Reader endpoint – connects to one of the available Aurora Replicas for that DB cluster. Each Aurora DB cluster has one reader endpoint. The reader endpoint provides load-balancing support for read-only connections to the DB cluster. Use the reader endpoint for read operations, such as queries. You can’t use the reader endpoint for write operations.
Custom endpoint – represents a set of DB instances that you choose. When you connect to the endpoint, Aurora performs load balancing and chooses one of the instances in the group to handle the connection. You define which instances this endpoint refers to, and you decide what purpose the endpoint serves.
Instance endpoint – connects to a specific DB instance within an Aurora cluster. The instance endpoint provides direct control over connections to the DB cluster. The main way that you use instance endpoints is to diagnose capacity or performance issues that affect one specific instance in an Aurora cluster.

- When you connect to an Aurora cluster, the host name and port that you specify point to an intermediate handler called an endpoint.

Storage and Reliability

- Aurora data is stored in the cluster volume, which is designed for reliability. A cluster volume consists of copies of the data across multiple Availability Zones in a single AWS Region.
- Aurora automatically detects failures in the disk volumes that make up the cluster volume. When a segment of a disk volume fails, Aurora immediately repairs the segment. When Aurora repairs the disk segment, it uses the data in the other volumes that make up the cluster volume to ensure that the data in the repaired segment is current.
- Aurora preloads the buffer pool with the pages for known common queries that are stored in an in-memory page cache when a database starts up after it has been shut down or restarted after a failure.
- Aurora is designed to recover from a crash almost instantaneously and continue to serve your application data without the binary log. Aurora performs crash recovery asynchronously on parallel threads, so that your database is open and available immediately after a crash.
- Amazon Aurora Auto Scaling works with Amazon CloudWatch to automatically add and remove Aurora Replicas in response to changes in performance metrics that you specify. This feature is available in the PostgreSQL-compatible edition of Aurora. There is no additional cost to use Aurora Auto Scaling beyond what you already pay for Aurora and CloudWatch alarms.
- Dynamic resizing automatically decreases the allocated storage space from your Aurora database cluster when you delete data.

High Availability and Fault Tolerance

- When you create Aurora Replicas across Availability Zones, RDS automatically provisions and maintains them synchronously. The primary DB instance is synchronously replicated across Availability Zones to Aurora Replicas to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.
- An Aurora DB cluster is fault tolerant by design. If the primary instance in a DB cluster fails, Aurora automatically fails over to a new primary instance in one of two ways:
  - By promoting an existing Aurora Replica to the new primary instance
  - By creating a new primary instance
- Aurora storage is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically.
- Aurora backs up your cluster volume automatically and retains restore data for the length of the backup retention period, from 1 to 35 days.
- Aurora automatically maintains 6 copies of your data across 3 Availability Zones and will automatically attempt to recover your database in a healthy AZ with no data loss.
- Aurora has a Backtrack feature that rewinds or restores the DB cluster to the time you specify. However, take note that the Amazon Aurora Backtrack feature is not a total replacement for fully backing up your DB cluster since the limit for a backtrack window is only 72 hours.
- With Aurora MySQL, you can set up cross-region Aurora Replicas using either logical or physical replication. Aurora PostgreSQL does not currently support cross-region replicas.

Aurora Global Database

- An Aurora global database spans multiple AWS Regions, enabling low latency global reads and disaster recovery from region-wide outages.
- Consists of one primary AWS Region where your data is mastered, and one read-only, secondary AWS Region.
- Aurora global databases use dedicated infrastructure to replicate your data.
- Aurora global databases introduce a higher level of failover capability than a default Aurora cluster.
- An Aurora cluster can recover in less than 1 minute even in the event of a complete regional outage. This provides your application with an effective Recovery Point Objective (RPO) of 5 seconds and a Recovery Time Objective (RTO) of less than 1 minute.
- Has managed planned failover capability, which lets you change which AWS Region hosts the primary cluster while preserving the physical topology of your global database and avoiding unnecessary application changes.

DB Cluster Configurations

Aurora supports two types of instance classes:
- Memory Optimized
- Burstable Performance
Aurora Serverless

An on-demand, autoscaling configuration for Amazon Aurora (supports both MySQL and PostgreSQL).
1. Aurora Serverless V2
  - Architecture: Aurora Serverless v2 uses a distributed, multi-tenant architecture that allows it to scale instantly from hundreds to hundreds of thousands of transactions in a fraction of a second. Rather than creating or removing database instances, it scales the compute and memory capacity of the existing database cluster in fine-grained increments as demand changes. This eliminates the need for manual provisioning and resizing.
  - Capacity: Measured in Aurora Capacity Units (ACUs), which represent a specific amount of compute and memory resources.
    - Minimum: 0.5 ACU (approx. 1 GiB RAM).
    - Maximum: 128 ACUs (approx. 256 GiB RAM).
  - Key Improvements over V1:
    - Aurora Serverless v2 addresses v1’s limitations by enabling faster, more granular scaling, support for additional enterprise features, and improved availability and reliability.
    - Supports Multi-AZ clusters for high availability, allowing automatic failover to another Availability Zone without manual intervention.
    - Supports Global Database for cross-region disaster recovery, enabling low-latency global reads and fast recovery from region-wide outages.
    - Supports multiple Aurora Replicas to scale read traffic and improve fault tolerance.
    - Supports IAM Database Authentication, Performance Insights, and other advanced monitoring and security features.
  - Use Cases:
    - Ideal for variable workloads, unpredictable or intermittent usage, development and test environments, SaaS and multi-tenant applications, and any scenario requiring enterprise-grade features and scalability without manual capacity management.
2. Aurora Serverless V1
  - Note: Aurora Serverless v1 is a legacy offering and is no longer recommended for new applications (Starting January 8th, 2025). New deployments should use Aurora Serverless v2. The following information applies only to v1.
  - Aurora Serverless v1 is an on-demand, autoscaling configuration for Amazon Aurora that supports both MySQL and PostgreSQL. The cluster automatically starts up, shuts down, and adjusts capacity in response to application demand. However, scaling is performed in larger increments and is slower compared to v2. Aurora Serverless v1 is suitable mainly for infrequent, variable, or unpredictable workloads, but has several limitations compared to v2.
  - A non-Serverless Aurora DB cluster is called a provisioned DB cluster.
  - Instead of provisioning and managing database servers, you specify Aurora Capacity Units (ACUs). Each ACU is a combination of processing and memory capacity.
  - Pause and Resume: You can pause your Aurora Serverless DB cluster after a specified period of inactivity. The DB cluster automatically resumes and services the connection requests after receiving requests.
  - Failover: Aurora Serverless does not support fast failover, but it supports automatic multi-AZ failover.
  - Encryption: The cluster volume for an Aurora Serverless cluster is always encrypted. You can choose the encryption key, but not turn off encryption.
  - You can set the following specific values:
    - Minimum Aurora capacity unit – Aurora Serverless can reduce capacity down to this capacity unit.
    - Maximum Aurora capacity unit – Aurora Serverless can increase capacity up to this capacity unit.
    - Pause after inactivity – The amount of time with no database traffic to scale to zero processing capacity.
  - Pricing: You pay per second, only when the database is in use.
  - Snapshots: You can share Aurora Serverless DB cluster snapshots with other AWS accounts or publicly. You also have the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.
  - Limitations of Aurora Serverless v1:
    - Aurora Serverless supports specific MySQL and PostgreSQL versions only.
      - The port number for connections must be:
        
        3306 for Aurora MySQL
        
        5432 for Aurora PostgreSQL.
    - You can’t assign a public IP address to an Aurora Serverless DB cluster. You can access it only from within a VPC.
    - Each Aurora Serverless DB cluster requires two AWS PrivateLink endpoints.
    - A DB subnet group used by Aurora Serverless can’t have more than one subnet in the same Availability Zone.
    - Changes to a subnet group used by an Aurora Serverless DB cluster are not applied to the cluster.
    - Unsupported Features in v1:
      - Loading or saving data to S3
      - Invoking Lambda functions (Aurora MySQL)
      - Aurora Replicas
      - Backtrack
      - Multi-master clusters
      - Database cloning
      - IAM database authentication
      - Restoring a snapshot from a MySQL DB instance
      - Amazon RDS Performance Insights
        
        and several advanced Aurora features introduced in later versions.
Cluster Maintenance & Operations
- Rebooting Behavior:
  - When you reboot the primary instance of an Aurora DB cluster, RDS also automatically restarts all of the Aurora Replicas in that DB cluster.
  - When you reboot the primary instance, no failover occurs.
  - When you reboot an Aurora Replica, no failover occurs.
- Deletion Protection:
  - Enabled by default when you create a production DB cluster using the AWS Management Console.
  - Disabled by default if you create a cluster using the AWS CLI or API.
- Deletion Restrictions (Aurora MySQL):
  - You can’t delete a DB instance in a DB cluster if both of the following conditions are true:
  - The DB cluster is a Read Replica of another Aurora DB cluster.
  - The DB instance is the only instance in the DB cluster.

Aurora Multi Master
- The feature is available on Aurora MySQL 5.6
- Allows you to create multiple read-write instances of your Aurora database across multiple Availability Zones, which enables uptime-sensitive applications to achieve continuous write availability through instance failure.
- In the event of instance or Availability Zone failures, Aurora Multi-Master enables the Aurora database to maintain read and write availability with zero application downtime. There is no need for database failovers to resume write operations.

Aurora PostgreSQL Limitless Database

An automated, horizontal scaling (sharding) capability designed to enable Aurora PostgreSQL to scale beyond previous storage and throughput limits by partitioning data across multiple writer nodes.
Lets you scale beyond the existing Aurora limits for write throughput and storage by distributing a database workload over multiple Aurora writer instances while maintaining a single logical database endpoint.
Applications continue to connect using a single endpoint, with Aurora automatically routing queries and managing data placement across shards, so applications do not need to be aware of the underlying sharding.
Supports Amazon CloudWatch and Performance Insights to monitor and analyze performance across distributed shards and writer nodes.

Amazon RDS Data API

An HTTPS API endpoint that allows you to run SQL queries against your Aurora Serverless cluster without requiring persistent database connections or specialized client libraries.
Eliminates the need to manage database drivers, connection pooling, or persistent network connections, making it ideal for AWS Lambda, containerized applications, and other serverless or stateless workloads.
Uses AWS IAM permissions for authentication and fine-grained access control. Returns data in JSON format, which simplifies integration with modern applications.

Amazon Aurora Zero-ETL Integrations

Aurora to Amazon Redshift:
- Enables near real-time analytics on transactional data without building complex ETL pipelines. Data written to Aurora is propagated to Redshift within seconds, enabling business intelligence and analytical workloads with minimal lag.
Eliminates the need for manual data movement, ETL job development, or ongoing orchestration, reducing operational complexity and accelerating access to insights.

Amazon Aurora Monitoring

- Subscribe to Amazon RDS events to be notified when changes occur with a DB instance, DB cluster, DB cluster snapshot, DB parameter group, or DB security group.
- Database log files
- RDS Enhanced Monitoring — Look at metrics in real time for the operating system.
- RDS Performance Insights monitors your Amazon RDS DB instance load so that you can analyze and troubleshoot your database performance.
- CloudWatch Database Insights – Monitor performance metrics for Limitless Database shards.
- Use CloudWatch Metrics, Alarms and Logs

Amazon Aurora Security

- Use IAM to control access.
- To control which devices and EC2 instances can open connections to the endpoint and port of the DB instance for Aurora DB clusters in a VPC, you use a VPC security group.
- You can make endpoint and port connections using Transport Layer Security (TLS) / Secure Sockets Layer (SSL). In addition, firewall rules can control whether devices running at your company can open connections to a DB instance.
- Use RDS encryption to secure your RDS instances and snapshots at rest.
- You can authenticate to your DB cluster using AWS IAM database authentication. IAM database authentication works with Aurora MySQL and Aurora PostgreSQL. With this authentication method, you don’t need to use a password when you connect to a DB cluster. Instead, you use an authentication token, which is a unique string of characters that Amazon Aurora generates on request.
Aurora for MySQL
- Performance Enhancements
  - Push-Button Compute Scaling
  - Storage Auto-Scaling
  - Low-Latency Read Replicas
  - Serverless Configuration
  - Custom Database Endpoints
  - Fast insert accelerates parallel inserts sorted by primary key.
  - Aurora MySQL parallel query is an optimization that parallelizes some of the I/O and computation involved in processing data-intensive queries.
  - You can use the high-performance Advanced Auditing feature in Aurora MySQL to audit database activity. To do so, you enable the collection of audit logs by setting several DB cluster parameters.
- Scaling
  - Instance scaling – scale your Aurora DB cluster by modifying the DB instance class for each DB instance in the DB cluster.
  - Read scaling – as your read traffic increases, you can create additional Aurora Replicas and connect to them directly to distribute the read load for your DB cluster.

Feature	Amazon Aurora Replicas	MySQL Replicas
Number of Replicas	Up to 15	Up to 5
Replication type	Asynchronous (milliseconds)	Asynchronous (seconds)
Performance impact on primary	Low	High
Act as failover target	Yes (no data loss)	Yes (potentially minutes of data loss)
Automated failover	Yes	No
Support for user-defined replication delay	No	Yes
Support for different data or schema vs. primary	No	Yes

Aurora for PostgreSQL
- Performance Enhancements
  - Push-button Compute Scaling
  - Storage Auto-Scaling
  - Low-Latency Read Replicas
  - Custom Database Endpoints
- Scaling
  - Instance scaling
  - Read scaling
- Amazon Aurora PostgreSQL now supports logical replication. With logical replication, you can replicate data changes from your Aurora PostgreSQL database to other databases using native PostgreSQL replication slots, or data replication tools such as the AWS Database Migration Service.
- Rebooting the primary instance of an Amazon Aurora DB cluster also automatically reboots the Aurora Replicas for that DB cluster, in order to re-establish an entry point that guarantees read/write consistency across the DB cluster.
- You can import data (supported by the PostgreSQL COPY command) stored in an Amazon S3 bucket into a PostgreSQL table.

Amazon Aurora Pricing

- You are charged for DB instance hours, I/O requests, Backup storage and Data transfer.
- You can purchase On-Demand Instances and pay by the hour for the DB instance hours that you use, or Reserved Instances to reserve a DB instance for a one-year or three-year term and receive a significant discount compared to the on-demand DB instance pricing.
- Aurora PostgreSQL support for Kerberos and Microsoft Active Directory provides the benefits of single sign-on and centralized authentication of Aurora PostgreSQL database users. In addition to password-based and IAM-based authentication methods, you can also authenticate using AWS Managed Microsoft AD Service

Deep Dive on Amazon Aurora:

Note: If you are studying for the AWS Certified Data Engineer Associate exam, we highly recommend that you take our AWS Certified Data Engineer Associate Practice Exams and read our Data Engineer Associate exam study guide.

Validate Your Knowledge

Question 1

An online shopping platform is hosted on an Auto Scaling group of Amazon EC2 Spot instances and utilizes Amazon Aurora PostgreSQL as its database. It is required to optimize database workloads in the cluster by directing the production traffic to high-capacity instances and routing the reporting queries from the internal staff to the low-capacity instances.

Which is the most suitable configuration for the application as well as the Aurora database cluster to achieve this requirement?

Configure your application to use the reader endpoint for both production traffic and reporting queries, which will enable your Aurora database to automatically perform load-balancing among all the Aurora Replicas.
In your application, use the instance endpoint of your Aurora database to handle the incoming production traffic and use the cluster endpoint to handle reporting queries.
Create a custom endpoint in Aurora based on the specified criteria for the production traffic and another custom endpoint to handle the reporting queries.

Do nothing since by default, Aurora will automatically direct the production traffic to your high-capacity instances and the reporting queries to your low-capacity instances.

Show me the answer!

Correct Answer: 3

Amazon Aurora typically involves a cluster of DB instances instead of a single instance. Each connection is handled by a specific DB instance. When you connect to an Aurora cluster, the host name and port that you specify point to an intermediate handler called an endpoint. Aurora uses the endpoint mechanism to abstract these connections. Thus, you don’t have to hardcode all the hostnames or write your own logic for load-balancing and rerouting connections when some DB instances aren’t available.

For certain Aurora tasks, different instances or groups of instances perform different roles. For example, the primary instance handles all data definition language (DDL) and data manipulation language (DML) statements. Up to 15 Aurora Replicas handle read-only query traffic.

Using endpoints, you can map each connection to the appropriate instance or group of instances based on your use case. For example, to perform DDL statements you can connect to whichever instance is the primary instance. To perform queries, you can connect to the reader endpoint, with Aurora automatically performing load-balancing among all the Aurora Replicas. For clusters with DB instances of different capacities or configurations, you can connect to custom endpoints associated with different subsets of DB instances. For diagnosis or tuning, you can connect to a specific instance endpoint to examine details about a specific DB instance.

The custom endpoint provides load-balanced database connections based on criteria other than the read-only or read-write capability of the DB instances. For example, you might define a custom endpoint to connect to instances that use a particular AWS instance class or a particular DB parameter group. Then you might tell particular groups of users about this custom endpoint. For example, you might direct internal users to low-capacity instances for report generation or ad hoc (one-time) querying, and direct production traffic to high-capacity instances.

Hence, the correct answer is: Create a custom endpoint in Aurora based on the specified criteria for the production traffic and another custom endpoint to handle the reporting queries.

The option that says: Configuring your application to use the reader endpoint for both production traffic and reporting queries, which will enable your Aurora database to automatically perform load-balancing among all the Aurora Replicas is incorrect. Although it is true that a reader endpoint enables your Aurora database to automatically perform load-balancing among all the Aurora Replicas, it is quite limited to doing read operations only. You still need to use a custom endpoint to load-balance the database connections based on the specified criteria.

The option that says: In your application, use the instance endpoint of your Aurora database to handle the incoming production traffic and use the cluster endpoint to handle reporting queries is incorrect because a cluster endpoint (also known as a writer endpoint) for an Aurora DB cluster simply connects to the current primary DB instance for that DB cluster. This endpoint can perform write operations in the database such as DDL statements, which is perfect for handling production traffic but not suitable for handling queries for reporting since there will be no write database operations that will be sent. Moreover, the endpoint does not point to lower-capacity or high-capacity instances as per the requirement. A better solution for this is to use a custom endpoint.

The option that says: Do nothing since by default, Aurora will automatically direct the production traffic to your high-capacity instances and the reporting queries to your low-capacity instances is incorrect because Aurora does not do this by default. You have to create custom endpoints in order to accomplish this requirement.

References:

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.Endpoints.html
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Endpoints.Custom.html

Note: This question was extracted from our AWS Certified Solutions Architect Associate Practice Exams.

Question 2

A Database Specialist is adding new indexes and altering large tables of an Amazon Aurora PostgreSQL database that uses a medium instance type with a default configuration. The application suddenly crashes with the following error message when loading a large dataset:

ERROR: could not write block 18980612 of temporary file: No space left on device

Which of the following can the Database Specialist do to resolve this issue? (Select TWO.)

Modify the Aurora database to use a DB instance class with more local SSD storage.
Enable local storage scaling on the Amazon Aurora database, which is disabled by default.
Add an Auto Scaling policy to the Aurora database which will automatically increase the local storage.
Wait for a few minutes until Amazon Aurora automatically scales out the cluster volume storage and then reload the datasets once again.
Lessen the database workload to reduce the amount of temporary storage required.

Show me the answer!

Correct Answers: 1, 5

Each DB instance in an Amazon Aurora DB cluster uses local solid-state drive (SSD) storage to store temporary tables for a session. This local storage for temporary tables doesn’t automatically grow like the Aurora cluster volume. Instead, the amount of local storage is limited. The limit is based on the DB instance class for DB instances in your DB cluster.

Instances in Aurora clusters have two types of storage:

– Storage for persistent data (called the cluster volume). This storage type increases automatically when more space is required.

– Local storage for each Aurora instance in the cluster, based on the instance class. This storage type and size is bound to the instance class, and can be changed only by moving to a larger DB instance class. Aurora for MySQL uses local storage for storing error logs, general logs, slow query logs, audit logs, and non-InnoDB temporary tables.

To show the amount of storage available for temporary tables and logs, you can use the CloudWatch metric FreeLocalStorage. This metric is for per-instance temporary volumes, not the cluster volume. In some cases, you can’t modify your workload to reduce the amount of temporary storage required. If so, you have to modify your DB instances to use a DB instance class that has more local SSD storage.

Hence, the correct answers are:

– Modify the Aurora database to use a DB instance class with more local SSD storage.

– Lessen the database workload to reduce the amount of temporary storage required.

The option that says: Enable local storage scaling on the Amazon Aurora database, which is disabled by default is incorrect because the amount of local storage is limited and can’t be automatically scaled. Amazon Aurora also does not have a feature that you can enable to scale the local storage automatically.

The option that says: Add an Auto Scaling policy to the Aurora database, which will automatically increase the local storage is incorrect because the root cause of this issue is the lack of free local storage of the Aurora database and not the lack of an Auto Scaling policy. By default, the Amazon Aurora storage automatically scales with the data in your cluster volume, which is why it doesn’t need to have a new Auto Scaling policy. What it needs is a larger instance type to increase its local SSD storage.

The option that says: Wait for a few minutes until Amazon Aurora automatically scales out the cluster volume storage and then reload the datasets once again is incorrect because the error message pertains to the lack of available disk space in the local storage and not on the cluster volume. The cluster volume is a type of storage that increases automatically when more space is required.

References:
https://aws.amazon.com/premiumsupport/knowledge-center/postgresql-aurora-storage-issue/
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_Troubleshooting.html#CHAP_Troubleshooting.Aurora.NoSpaceLeft
https://docs.amazonaws.cn/en_us/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.BestPractices.html#AuroraPostgreSQL.BestPractices.TroubleshootingStorage

Note: This question was extracted from our AWS Certified Database Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, check out the Tutorials Dojo Portal: