Last updated on December 11, 2024
Amazon SageMaker AI Cheat Sheet
- A fully managed service that allows data scientists and developers to easily build, train, and deploy machine learning models at scale.
- Provides built-in algorithms that you can immediately use for model training.
- Also supports custom algorithms through docker containers.
- One-click model deployment.
Concepts
- Hyperparameters
- It refers to a set of variables that controls how a model is trained.
- You can think of them as “volume knobs” that you can tune to acquire your model’s objective.
- Automatic Model Tuning
- Finds the best version of a model by automating the training job within the limits of the hyperparameters that you specified.
- Training
- The process where you create a machine learning model.
- Inference
- The process of using the trained model to make predictions.
- Local Mode
- Allows you to create and deploy estimators to your local machine for testing.
- You must install the Amazon SageMaker Python SDK on your local environment to use local mode.
Common Training Data Formats For Built-in Algorithms
- CSV
- Protobuf RecordIO
- JSON
- Libsvm
- JPEG
- PNG
Input modes for transferring training data
- File mode
- Downloads data into the SageMaker instance volume before model training commences.
- Slower than pipe mode
- Used for Incremental training
- Pipe mode
- Directly stream data from Amazon S3 into the training algorithm container.
- There’s no need to procure large volumes to store large datasets.
- Provides shorter startup and training times.
- Higher I/O throughputs
- Faster than File mode.
- You MUST use protobuf RecordIO as your training data format before you can take advantage of the Pipe mode.
Two methods of deploying a model for inference
- Amazon SageMaker Hosting Services
- Provides a persistent HTTPS endpoint for getting predictions one at a time.
- Suited for web applications that need sub-second latency response.
- Amazon SageMaker Batch Transform
- Doesn’t need a persistent endpoint
- Get inferences for an entire dataset
SageMaker features
- SageMaker AutoPilot – automates the process of building, tuning, and deploying machine learning models based on a tabular dataset (CSV or Parquet). SageMaker Autopilot automatically explores different solutions to find the best model.
- SageMaker GroundTruth – a data labeling service that lets you use workforce (human annotators) through your own private annotators, Amazon Mechanical Turk, or third-party services.
- SageMaker Data Wrangler – a visual data preparation and cleaning tool that allows data scientists and engineers to easily clean and prepare data for machine learning.
- SageMaker Neo – allows you to optimize machine learning models for deployment on edge devices to run faster with no loss in accuracy.
- SageMaker Automatic Model Tuning – automates the process of hyperparameter tuning based on the algorithm and hyperparameter ranges you specify. This can result in saving a significant amount of time for data scientists and engineers.
- Amazon SageMaker Debugger – provides real-time insights into the training process of machine learning models, enabling rapid iteration. It allows you to monitor and debug training issues, optimize model performance, and improve accuracy by analyzing various model-related metrics, such as weights, gradients, and biases.
- Managed Spot Training – allows data scientists and engineers to save up to 90% on the cost of training machine learning models by using spare compute capacity.
- Distributed Training – allows for splitting the data and distributing the workload across multiple instances, improving speed and performance. It supports various distributed training frameworks such as TensorFlow, PyTorch, and MXNet.
- SageMaker Studio – A web-based IDE for machine learning. It provides tools for the entire ML lifecycle, including data wrangling, model training, and deployment, all in one unified interface. Helps data scientists and developers quickly build and train models and streamline ML workflows.
- SageMaker Notebooks – A fully managed, scalable Jupyter notebook for quick data exploration, model building, and training. It helps you start working on ML models immediately without managing infrastructure.
- SageMaker Distributed Data Parallelism (SMDDP)- A feature that enables efficient distributed training of deep learning models by automatically parallelizing data across multiple GPUs and instances. Speeds up the training of large models on massive datasets, improving scalability and reducing training time. It supports frameworks like TensorFlow and PyTorch, making it ideal for large-scale deep-learning tasks that require intensive computational resources.
- SageMaker Pipelines – A fully managed CI/CD service for automating the end-to-end machine learning workflow, including data preprocessing, model training, and deployment. It helps automate and streamline the ML lifecycle, ensuring consistency and efficiency.
- SageMaker Model Monitor – Monitors models in production to detect issues such as data drift or model performance degradation. Ensures that models continue to perform accurately after deployment.
- SageMaker Model Registry – A centralized repository for managing ML models, including tracking versions and promoting models for deployment. Ensures proper model version control and governance across teams.
- SageMaker Edge Manager – offers model management for edge devices, enabling you to optimize, secure, monitor, and manage machine learning models on various edge device fleets, including smart cameras, robots, PCs, and mobile devices.
- SageMaker Feature Store – a fully managed repository designed to store, share, and manage features for machine learning models. It ensures high-quality, standardized features are available for both training and real-time inference, helping teams keep their feature data synchronized and consistent.
- SageMaker JumpStart – provides pre-trained foundation models and ready-to-use solutions for common machine learning tasks like text summarization, image generation, and object detection, enabling users to deploy and experiment without deep expertise quickly.
Optimization
- Convert training data into a protobuf RecordIO format to make use of Pipe mode.
- Use Amazon FSx for Lustre to accelerate File mode training jobs.
Amazon SageMaker Monitoring
- You can publish SageMaker instance metrics to the CloudWatch dashboard to gain a unified view of CPU utilization, memory utilization, and latency.
- You can also send training metrics to the CloudWatch dashboard to monitor model performance in real time.
- Amazon CloudTrail helps you detect unauthorized SageMaker API calls.
Amazon SageMaker Pricing
- The building, training, and deploying of ML models are billed by the second, with no minimum fees and no upfront commitments.
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.
Validate Your Knowledge
Question 1
A Machine Learning Specialist has various CSV training datasets stored in an S3 bucket. Previous models trained with similar training data sizes using the Amazon SageMaker Linear learner algorithm have a slow training process. The Specialist wants to decrease the amount of time spent on training the model.
Which combination of steps should be taken by the Specialist? (Select TWO.)
- Convert the CSV training dataset into Apache Parquet format.
- Train the model using Amazon SageMaker Pipe mode.
- Convert the CSV training dataset into Protobuf RecordIO format.
- Train the model using Amazon SageMaker File mode.
- Stream the dataset into Amazon SageMaker using Amazon Kinesis Firehose to train the model.
Question 2
A Machine Learning Specialist is using a 100GB EBS volume as a storage disk for an Amazon SageMaker instance. After running a few training jobs, the Specialist realized that he needed a higher I/O throughput and a shorter job startup and execution time.
Which approach will give the MOST satisfactory result based on the requirements?
- Store the training dataset in Amazon S3 and use the Pipe input mode for training the model.
- Increase the size of the EBS volume to obtain higher I/O throughput.
- Upgrade the SageMaker instance to a larger size.
- Increase the EBS volume to 500GB and use the File mode for training the model.
For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:
Amazon SageMaker Cheat Sheet References:
https://aws.amazon.com/sagemaker/faqs/
https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
https://aws.amazon.com/sagemaker/pricing/