Last updated on August 30, 2023
AWS Auto Scaling Cheat Sheet
- Configure automatic scaling for the AWS resources quickly through a scaling plan that uses dynamic scaling and predictive scaling.
- Optimize for availability, for cost, or a balance of both.
- Scaling in means decreasing the size of a group while scaling out means increasing the size of a group.
- Useful for
- Cyclical traffic such as high use of resources during regular business hours and low use of resources overnight
- On and off traffic patterns, such as batch processing, testing, or periodic analysis
- Variable traffic patterns, such as software for marketing campaigns with periods of spiky growth
- It is a region specific service.
Features
- Launch or terminate EC2 instances in an Auto Scaling group.
- Launch or terminate instances from an EC2 Spot Fleet request, or automatically replace instances that get interrupted for price or capacity reasons.
- Adjust the ECS service desired count up or down in response to load variations.
- Enable a DynamoDB table or a global secondary index to increase or decrease its provisioned read and write capacity to handle increases in traffic without throttling.
- Dynamically adjust the number of Aurora read replicas provisioned for an Aurora DB cluster to handle changes in active connections or workload.
- Use Dynamic Scaling to add and remove capacity for resources to maintain resource utilization at the specified target value.
- Use Predictive Scaling to forecast your future load demands by analyzing your historical records for a metric. It also allows you to schedule scaling actions that proactively add and remove resource capacity to reflect the load forecast, and control maximum capacity behavior. Only available for EC2 Auto Scaling groups.
- AWS Auto Scaling scans your environment and automatically discovers the scalable cloud resources underlying your application, so you don’t have to manually identify these resources one by one through individual service interfaces.
- You can suspend and resume any of your AWS Application Auto Scaling actions.
- A warm pool allows you to decrease latency for applications that have exceptionally long boot times. This will help avoid over-provisioning your Auto Scaling groups in order to manage latency and improve application performance.
Amazon EC2 Auto Scaling
- Ensuring you have the correct number of EC2 instances available to handle your application load using Auto Scaling Groups.
- An Auto Scaling group contains a collection of EC2 instances that share similar characteristics and are treated as a logical grouping for the purposes of instance scaling and management.
- You specify the minimum, maximum and desired number of instances in each Auto Scaling group.
- Key Components
Groups |
Your EC2 instances are organized into groups so that they are treated as a logical unit for scaling and management. When you create a group, you can specify its minimum, maximum, and desired number of EC2 instances. |
Configuration templates |
Your group uses a launch template as a template for its EC2 instances. When you create a launch template, you can specify information such as the AMI ID, instance type, key pair, security groups, and block device mapping for your instances. |
Scaling options |
How to scale your Auto Scaling groups. |
- Auto Scaling Lifecycle
- You can add a lifecycle hook to your Auto Scaling group to perform custom actions when instances launch or terminate.
- Applies to instances launched or terminated
- Maximum instance lifetime
- Instance refresh
- Capacity rebalancing
- Warm pools
- Scaling Options
- Scale to maintain current instance levels at all times
- Manual Scaling
- Scale based on a schedule
- Scale based on a demand
- Use predictive scaling
- Scaling Policy Types
- Target tracking scaling—Increase or decrease the current capacity of the group based on a target value for a specific metric.
- Step scaling—Increase or decrease the current capacity of the group based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
- Simple scaling—Increase or decrease the current capacity of the group based on a single scaling adjustment.
- The size of your Auto Scaling group is restricted by capacity limits, which can be resized between the minimum and maximum size limits.
- The cooldown period is a configurable setting that helps ensure to not launch or terminate additional instances before previous scaling activities take effect.
- EC2 Auto Scaling supports cooldown periods when using simple scaling policies, but not when using target tracking policies, step scaling policies, or scheduled scaling.
- You can use the default instance warmup to improve CloudWatch metrics used for dynamic scaling. This feature lets your EC2 instances finish warming up before they contribute the usage data.
- Dynamic scaling can better react to the demand curve of your application if you utilize a target tracking scaling policy based on a custom Amazon SQS queue metric.
- Amazon EC2 Auto Scaling marks an instance as unhealthy if the instance is in a state other than running, the system status is impaired, or Elastic Load Balancing reports that the instance failed the health checks.
- Termination of Instances
- When you configure automatic scale in, you must decide which instances should terminate first and set up a termination policy. You can also use instance protection to prevent specific instances from being terminated during automatic scale in.
- Default Termination Policy
- Custom Termination Policies
- OldestInstance – Terminate the oldest instance in the group.
- NewestInstance – Terminate the newest instance in the group.
- OldestLaunchConfiguration – Terminate instances that have the oldest launch configuration.
- ClosestToNextInstanceHour – Terminate instances that are closest to the next billing hour.
- An instance can be temporarily removed from an Auto Scaling group by changing its state from InService into Standby.
- You can create launch templates that specifies instance configuration information when you launch EC2 instances, and allows you to have multiple versions of a template.
- A launch configuration is an instance configuration template that an Auto Scaling group uses to launch EC2 instances, and you specify information for the instances.
- You can specify your launch configuration with multiple Auto Scaling groups.
- You can only specify one launch configuration for an Auto Scaling group at a time, and you can’t modify a launch configuration after you’ve created it.
- When you create a VPC, by default its tenancy attribute is set to default. You can launch instances with a tenancy value of dedicated so that they run as single-tenancy instances. Otherwise, they run as shared-tenancy instances by default.
- If you set the tenancy attribute of a VPC to dedicated, all instances launched in the VPC run as single-tenancy instances.
- When you create a launch configuration, the default value for the instance placement tenancy is null and the instance tenancy is controlled by the tenancy attribute of the VPC.
Launch Configuration Tenancy |
VPC Tenancy = default |
VPC Tenancy = dedicated |
not specified |
shared-tenancy instance |
Dedicated Instance |
default |
shared-tenancy instance |
Dedicated Instance |
dedicated |
Dedicated Instance |
Dedicated Instance |
-
- If you are launching the instances in your Auto Scaling group in EC2-Classic, you can link them to a VPC using ClassicLink.
Application Auto Scaling
-
- Allows you to configure automatic scaling for the following resources:
- Amazon ECS services
- Spot Fleet requests
- Amazon EMR clusters
- ElastiCache for Redis clusters
- Amazon Neptune clusters
- AppStream 2.0 fleets
- Amazon Comprehend
- DynamoDB tables and global secondary indexes
- Amazon Keyspaces tables
- Aurora replicas
- Amazon SageMaker endpoint variants
- Lambda function provisioned concurrency
- Amazon Managed Streaming for Apache Kafka
- Custom resources provided by your own applications or services.
- Features
- Target tracking scaling—Scale a resource based on a target value for a specific CloudWatch metric.
- Step scaling— Scale a resource based on a set of scaling adjustments that vary based on the size of the alarm breach.
- Scheduled scaling—Scale a resource based on the date and time. The timezone can either be in UTC or in your local timezone.
- Target tracking scaling
- You can have multiple target tracking scaling policies for a scalable target, provided that each of them uses a different metric.
- You can also optionally disable the scale-in portion of a target tracking scaling policy.
- Step scaling
- Increase or decrease the current capacity of a scalable target based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
- Scheduled scaling
- Scale your application in response to predictable load changes by creating scheduled actions, which tell Application Auto Scaling to perform scaling activities at specific times.
- The scale out cooldown period is the amount of time, in seconds, after a scale out activity completes before another scale out activity can start.
- The scale in cooldown period is the amount of time, in seconds, after a scale in activity completes before another scale in activity can start.
- Allows you to configure automatic scaling for the following resources:
- You can attach one or more classic ELBs to your existing Auto Scaling Groups. The ELBs must be in the same region.
- Auto Scaling rebalances by launching new EC2 instances in the AZs that have fewer instances first, only then will it start terminating instances in AZs that had more instances
Monitoring
-
- Health checks – identifies any instances that are unhealthy
- Amazon EC2 status checks (default)
- Elastic Load Balancing health checks
- Custom health checks.
- Auto scaling does not perform health checks on instances in the standby state. Standby state can be used for performing updates/changes/troubleshooting without health checks being performed or replacement instances being launched.
- CloudWatch metrics – enables you to retrieve statistics about Auto Scaling-published data points as an ordered set of time-series data, known as metrics. You can use these metrics to verify that your system is performing as expected.
- CloudWatch Events – Auto Scaling can submit events to CloudWatch Events when your Auto Scaling groups launch or terminate instances, or when a lifecycle action occurs.
- SNS notifications – Auto Scaling can send Amazon SNS notifications when your Auto Scaling groups launch or terminate instances.
- CloudTrail logs – enables you to keep track of the calls made to the Auto Scaling API by or on behalf of your AWS account, and stores the information in log files in an S3 bucket that you specify.
- Health checks – identifies any instances that are unhealthy
AWS Auto Scaling Security
-
- Use IAM to help secure your resources by controlling who can perform AWS Auto Scaling actions.
- By default, a brand new IAM user has NO permissions to do anything. To grant permissions to call Auto Scaling actions, you attach an IAM policy to the IAM users or groups that require the permissions it grants.
Capacity Management Made Easy with Amazon EC2 Auto Scaling:
AWS Auto Scaling-related Cheat Sheets:
Validate Your Knowledge
Question 1
A large Philippine-based Business Process Outsourcing company is building a two-tier web application in their VPC to serve dynamic transaction-based content. The data tier is leveraging an Online Transactional Processing (OLTP) database but for the web tier, they are still deciding what service they will use.
What AWS services should you leverage to build an elastic and scalable web tier?
- Elastic Load Balancing, Amazon EC2, and Auto Scaling
- Elastic Load Balancing, Amazon RDS with Multi-AZ, and Amazon S3
- Amazon RDS with Multi-AZ and Auto Scaling
- Amazon EC2, Amazon DynamoDB, and Amazon S3
Question 2
A tech company has a CRM application hosted on an Auto Scaling group of On-Demand EC2 instances with different instance types and sizes. The application is extensively used during office hours from 9 in the morning to 5 in the afternoon. Their users are complaining that the performance of the application is slow during the start of the day but then works normally after a couple of hours.
Which of the following is the MOST operationally efficient solution to implement to ensure the application works properly at the beginning of the day?
Which of the following can be done to ensure that the application works properly at the beginning of the day?
- Configure a Dynamic scaling policy for the Auto Scaling group to launch new instances based on the CPU utilization.
- Configure a Dynamic scaling policy for the Auto Scaling group to launch new instances based on the Memory utilization.
- Configure a Scheduled scaling policy for the Auto Scaling group to launch new instances before the start of the day.
- Configure a Predictive scaling policy for the Auto Scaling group to automatically adjust the number of Amazon EC2 instances
For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:
Additional Training Materials: AWS Auto Scaling Video Courses on Udemy
AWS Auto Scaling Cheat Sheet References:
https://docs.aws.amazon.com/autoscaling/plans/userguide/what-is-aws-auto-scaling.html
https://aws.amazon.com/autoscaling/features/
https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html
https://docs.aws.amazon.com/autoscaling/application/userguide/what-is-application-auto-scaling.html
https://aws.amazon.com/autoscaling/pricing/
https://aws.amazon.com/autoscaling/faqs/