Ends in
00
days
00
hrs
00
mins
00
secs
LEARN MORE

FLASH SALE - AWS SAA, CDA, and SysOps Practice Exams at $12.99 ONLY!

Amazon SageMaker

  • A fully managed service that allows data scientists and developers to easily build, train, and deploy machine learning models at scale.
  • Provides built-in algorithms that you can immediately use for model training.
  • Also supports custom algorithms through docker containers.
  • One-click model deployment.

Concepts

  • Hyperparameters
    • It refers to a set of variables that controls how a model is trained.
    • You can think of them as “volume knobs” that you can tune to acquire your model’s objective.
  • Automatic Model Tuning
    • Finds the best version of a model by automating the training job within the limits of the hyperparameters that you specified.
  • Training
    • The process where you create a machine learning model.
  • Inference
    • The process of using the trained model to make predictions.
  • Local Mode
    • Allows you to create and deploy estimators to your local machine for testing.
    • You must install the Amazon SageMaker Python SDK on your local environment to use local mode.

Common Training Data Formats For Built-in Algorithms

  • CSV
  • Protobuf RecordIO
  • JSON
  • Libsvm
  • IT Certification Category (English)728x90
  • JPEG
  • PNG

Input modes for transferring training data

  • File mode
    • Downloads data into the SageMaker instance volume before model training commences.
    • Slower than pipe mode
    • Used for Incremental training
  • Pipe mode
    • Directly stream data from Amazon S3 into the training algorithm container.
    • There’s no need to procure large volumes to store large datasets.
    • Provides shorter startup and training times.
    • Higher I/O throughputs
    • Faster than File mode.
    • You MUST use protobuf RecordIO as your training data format before you can take advantage of the Pipe mode.

Two methods of deploying a model for inference

  • Amazon SageMaker Hosting Services
    • Provides a persistent HTTPS endpoint for getting predictions one at a time.
    • Suited for web applications that need sub-second latency response.
  • Amazon SageMaker Batch Transform
    • Doesn’t need a persistent endpoint
    • Get inferences for an entire dataset

Optimization

  • Convert training data into a protobuf RecordIO format to make use of Pipe mode.
  • Use Amazon FSx for Lustre to accelerate File mode training jobs.

Monitoring

  • You can publish SageMaker instance metrics to the CloudWatch dashboard to gain a unified view of its CPU utilization, memory utilization, and latency.
  • You can also send training metrics to the CloudWatch dashboard to monitor model performance in real-time.
  • Amazon CloudTrail helps you detect unauthorized SageMaker API calls.

Pricing

  • The building, training, and deploying of ML models are billed by the second, with no minimum fees and no upfront commitments.

Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.

AWS Certified Machine Learning Specialty Practice Exams

Validate Your Knowledge

Question 1

A Machine Learning Specialist has various CSV training datasets stored in an S3 bucket. Previous models trained with similar training data size using the Amazon SageMaker XGBoost algorithm have a slow training process. The Specialist wants to decrease the amount of time spent on training the model.

Which combination of steps should be taken by the Specialist? (Select TWO.)

  1. Convert the CSV training dataset into Apache Parquet format.
  2. Train the model using Amazon SageMaker Pipe mode.
  3. Convert the CSV training dataset into Protobuf RecordIO format.
  4. Train the model using Amazon SageMaker File mode.
  5. Stream the dataset into Amazon SageMaker using Amazon Kinesis Firehose to train the model.

Correct Answer: 2, 3

Most Amazon SageMaker algorithms work best when you use the optimized protobuf recordIO data format for training. Using this format allows you to take advantage of Pipe mode. In Pipe mode, your training job streams data directly from Amazon Simple Storage Service (Amazon S3).

Streaming can provide faster start times for training jobs and better throughput. This is in contrast to File mode, in which your data from Amazon S3 is stored on the training instance volumes. File mode uses disk space to store both your final model artifacts and your full training dataset. By streaming your data directly from Amazon S3 in Pipe mode, you reduce the size of Amazon Elastic Block Store volumes of your training instances.

Hence, the correct answers are:

– Convert the CSV training dataset into Protobuf RecordIO format.

– Train the model using Amazon SageMaker Pipe mode.

The option that says: Convert the CSV training dataset into Apache Parquet format is incorrect because Amazon SageMaker’s Pipe mode does not support Apache Parquet data format.

The option that says: Train the model using Amazon SageMaker File mode is incorrect because the File mode is the default input mode for Amazon SageMaker and is slower than Pipe mode.

The option that says: Stream the dataset into Amazon SageMaker using Amazon Kinesis Firehose to train the model is incorrect because you can’t use Amazon Kinesis Firehose in this way. It can’t use Amazon S3 as its data source.

References:
https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html
https://aws.amazon.com/blogs/machine-learning/using-pipe-input-mode-for-amazon-sagemaker-algorithms/

Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

Question 2

A Machine Learning Specialist is using a 100GB EBS volume as a storage disk for an Amazon SageMaker instance. After running a few training jobs, the Specialist realized that he needed a higher I/O throughput and a shorter job startup and execution time.

Which approach will give the MOST satisfactory result based on the requirements?

  1. Store the training dataset in Amazon S3 and use the Pipe input mode for training the model.
  2. Increase the size of the EBS volume to obtain higher I/O throughput.
  3. Upgrade the SageMaker instance to a larger size.
  4. Tutorials Dojo Study Guide and Cheatsheet
  5. Increase the EBS volume to 500GB and use the File mode for training the model.

Correct Answer: 1

With Pipe input mode, your data is fed on-the-fly into the algorithm container without involving any disk I/O. This approach shortens the lengthy download process and dramatically reduces startup time. It also offers generally better read throughput than File input mode. This is because your data is fetched from Amazon S3 by a highly optimized multi-threaded background process. It also allows you to train on datasets that are much larger than the 16 TB Amazon Elastic Block Store (EBS) volume size limit.

Pipe mode enables the following:

– Shorter startup times because the data is being streamed instead of being downloaded to your training instances.

– Higher I/O throughputs due to high-performance streaming agent.

– Virtually limitless data processing capacity.

With Pipe mode, the startup time is reduced significantly from 11.5 minutes to 1.5 minutes in most experiments. Also, the overall I/O throughput is at least twice as fast as that of File mode. Both of these improvements made a positive impact on the total training time, which is reduced by up to 35%.

Hence, the correct answer is: Store the training dataset in Amazon S3 and use the Pipe input mode for training the model.

The option that says: Increase the size of the EBS volume to obtain higher I/O throughput is incorrect. Even if you set the EBS volume to its maximum throughput, training in Pipe mode would still have a greater impact in terms of reducing the job start-up time and execution time.

The option that says: Upgrade the SageMaker instance to a larger size is incorrect. Upgrading the instance alone won’t have as much effect as running the SageMaker instance in Pipe mode.

The option that says: Increase the EBS volume to 500GB and use the File mode for training the model is incorrect. File mode is the default mode for training a model in Amazon SageMaker. This would surely increase the throughput but it’s still not the best answer among the given choices.

References:
https://aws.amazon.com/blogs/machine-learning/using-pipe-input-mode-for-amazon-sagemaker-algorithms/
https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:

Tutorials Dojo AWS Practice Tests

References:
https://aws.amazon.com/sagemaker/faqs/
https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
https://aws.amazon.com/sagemaker/pricing/

5-DAY FLASH SALE! Big Discounts on our SAA, CDA, and SysOps Practice Exams

Pass your AWS and Azure Certifications with the Tutorials Dojo Portal

Tutorials Dojo portal

Our Bestselling AWS Certified Solutions Architect Associate Practice Exams

AWS Certified Solutions Architect Associate Practice Exams

Enroll Now – Our AWS Practice Exams with 95% Passing Rate

AWS Practice Exams Tutorials Dojo

Enroll Now – Our Azure Certification Exam Reviewers

azure reviewers tutorials dojo

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

Tutorials Dojo Study Guide and Cheat Sheets-2

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Browse Other Courses

Generic Category (English)300x250

Recent Posts

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers
error: Content is protected !!