- A fully managed service that allows data scientists and developers to easily build, train, and deploy machine learning models at scale.
- Provides built-in algorithms that you can immediately use for model training.
- Also supports custom algorithms through docker containers.
- One-click model deployment.
Concepts
- Hyperparameters
- It refers to a set of variables that controls how a model is trained.
- You can think of them as “volume knobs” that you can tune to acquire your model’s objective.
- Automatic Model Tuning
- Finds the best version of a model by automating the training job within the limits of the hyperparameters that you specified.
- Training
- The process where you create a machine learning model.
- Inference
- The process of using the trained model to make predictions.
- Local Mode
- Allows you to create and deploy estimators to your local machine for testing.
- You must install the Amazon SageMaker Python SDK on your local environment to use local mode.
Common Training Data Formats For Built-in Algorithms
- CSV
- Protobuf RecordIO
- JSON
- Libsvm
- JPEG
- PNG
Input modes for transferring training data
- File mode
- Downloads data into the SageMaker instance volume before model training commences.
- Slower than pipe mode
- Used for Incremental training
- Pipe mode
- Directly stream data from Amazon S3 into the training algorithm container.
- There’s no need to procure large volumes to store large datasets.
- Provides shorter startup and training times.
- Higher I/O throughputs
- Faster than File mode.
- You MUST use protobuf RecordIO as your training data format before you can take advantage of the Pipe mode.
Two methods of deploying a model for inference
- Amazon SageMaker Hosting Services
- Provides a persistent HTTPS endpoint for getting predictions one at a time.
- Suited for web applications that need sub-second latency response.
- Amazon SageMaker Batch Transform
- Doesn’t need a persistent endpoint
- Get inferences for an entire dataset
SageMaker features
- SageMaker AutoPilot – automates the process of building, tuning, and deploying machine learning models based on your dataset. SageMaker Autopilot automatically explores different solutions to find the best model.
- SageMaker GroundTruth – a data labeling service that lets you use workforce (human annotators) through your own private annotators, Amazon Mechanical Turk, or third-party services.
- SageMaker Data Wrangler – a visual data preparation and cleaning tool that allows data scientists and engineers to easily clean and prepare data for machine learning.
- SageMaker Neo – allows you to optimize machine learning models for deployment on edge devices to run faster with no loss in accuracy.
- SageMaker Automatic Model Tuning – automates the process of hyperparameter tuning based on the algorithm and hyperparameter ranges you specify. This can result in saving a significant amount of time for data scientists and engineers.
- Amazon SageMaker Debugger – detects and diagnoses issues during the training of machine learning models.
- Spot Training – allows data scientists and engineers to save up to 90% on the cost of training machine learning models by using spare compute capacity.
- Distributed Training – allows for splitting the data and distributing the workload across multiple instances, improving speed and performance. It supports various distributed training frameworks such as TensorFlow, PyTorch, and MXNet.
Optimization
- Convert training data into a protobuf RecordIO format to make use of Pipe mode.
- Use Amazon FSx for Lustre to accelerate File mode training jobs.
Monitoring
- You can publish SageMaker instance metrics to the CloudWatch dashboard to gain a unified view of CPU utilization, memory utilization, and latency.
- You can also send training metrics to the CloudWatch dashboard to monitor model performance in real time.
- Amazon CloudTrail helps you detect unauthorized SageMaker API calls.
Pricing
- The building, training, and deploying of ML models are billed by the second, with no minimum fees and no upfront commitments.
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.
Validate Your Knowledge
Question 1
A Machine Learning Specialist has various CSV training datasets stored in an S3 bucket. Previous models trained with similar training data sizes using the Amazon SageMaker Linear learner algorithm have a slow training process. The Specialist wants to decrease the amount of time spent on training the model.
Which combination of steps should be taken by the Specialist? (Select TWO.)
- Convert the CSV training dataset into Apache Parquet format.
- Train the model using Amazon SageMaker Pipe mode.
- Convert the CSV training dataset into Protobuf RecordIO format.
- Train the model using Amazon SageMaker File mode.
- Stream the dataset into Amazon SageMaker using Amazon Kinesis Firehose to train the model.
Question 2
A Machine Learning Specialist is using a 100GB EBS volume as a storage disk for an Amazon SageMaker instance. After running a few training jobs, the Specialist realized that he needed a higher I/O throughput and a shorter job startup and execution time.
Which approach will give the MOST satisfactory result based on the requirements?
- Store the training dataset in Amazon S3 and use the Pipe input mode for training the model.
- Increase the size of the EBS volume to obtain higher I/O throughput.
- Upgrade the SageMaker instance to a larger size.
- Increase the EBS volume to 500GB and use the File mode for training the model.
For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:
References:
https://aws.amazon.com/sagemaker/faqs/
https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
https://aws.amazon.com/sagemaker/pricing/