AWS Certified Machine Learning – Specialty MLS-C01 Exam Study Path

Home » AWS » AWS Certified Machine Learning – Specialty MLS-C01 Exam Study Path

AWS Certified Machine Learning – Specialty MLS-C01 Exam Study Path

Last updated on October 3, 2023

The AWS Machine Learning — Specialty MLS-C01 Certification is intended for individuals who are responsible for developing data science or applied machine learning projects on the AWS Cloud. This specialty certification is quite different from any other AWS exam. If you already have prior experience with other AWS certifications, you’re probably expecting to be heavily tested on AWS services and how they can be architected to build solutions that can solve different business problems. However, this is not the case in the ML-Specialty certification. Aside from Amazon SageMaker, most of the questions that you’ll encounter have nothing to do with AWS services at all. 

The exam covers a wide area of general machine learning concepts. One should at least have a high-level understanding of different stages in machine learning such as choosing the correct algorithm for a specific use case, data collection, feature engineering, test-train splitting, tuning, training, and deploying a model for inference. The exam also expects you to have knowledge on the common issues that arise from model training (e.g., overfitting, unbalanced dataset, missing values in the dataset) and the methods to fix them (e.g., regularization/early stopping, oversampling/adding noise to data, data imputation).

Machine Learning is more on math concepts rather than software engineering. Although not specifically required, it would be advantageous if you have a background in statistics or college math (Linear algebra, Differential calculus) to understand how an algorithm works behind the scenes. Also, It would be best to gain hands-on experience first by building simple models. This will allow you to learn quickly and get used to the jargon in machine learning.

We recommend checking out the following materials

STUDY MATERIALS FOR THE MLS-C01 SPECIALTY EXAM

  1. Machine Learning Terminology and Process
  2. Machine Learning Algorithms
  3. Math for Machine Learning
  4. AWS Foundations: Machine Learning Basics
  5. AWS Machine Learning Lens
  6. Machine Learning Best Practices in Financial Services
  7. Neural Networks
  8. Introduction to Artificial Intelligence
  9. Amazon Sagemaker

We also recommend taking this free and highly interactive AWS Exam Readiness digital course for the AWS Certified Machine Learning Specialty MLS-C01 exam:

Other helpful materials

  1. AWS Machine Learning and AI Services Cheat Sheets
  2. Mike Chamber’s ML – Specialty Course
  3. Introduction to Machine Learning with Python
  4. Deep Learning with Python
  5. StatQuest

MLS-C01 RELATED AWS SERVICES TO FOCUS ON

Data Engineering

AWS Services

Concepts

  • Data ingestion techniques (Batch and Stream processing)
  • Data cleaning
  • ETL Pipeline
  • Building a data lake on Amazon S3
  • Available data storages for training with Amazon SageMaker
  • Amazon S3 lifecycle configuration
  • Amazon S3 data storage options

Exploratory Data Analysis

AWS Services

Concepts

  • Data Cleaning
  • Data labeling (for supervised models)
  • Using RecordIO protobuf format to leverage SageMaker’s Pipe mode for training
  • Data Visualization and Analysis
    • Scatter plot
    • Box plots
    • Confusion matrix
  • Feature Engineering
    • Normalization
    • Scaling
    • Data imputation techniques for filling missing values
    • Oversampling/Undersampling methods to fix unbalanced dataset
    • Regularization
    • Dimensionality Reduction
      • Principal Component Analysis (PCA)
      • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • One-hot encoding
    • Label encoding
    • Binning
    • Test-train splitting with randomization

Modeling

AWS Services

Amazon SageMaker built-in algorithms

  • Linear regression
  • Logistic regression
  • K-means clustering
  • Principal component analysis (PCA)
  • Factorization machines
  • Neural topic modeling
  • Latent Dirichlet allocation
  • XGBoost
  • Sequence-to-sequence
  • Time-series forecasting
  • BlazingText
  • Object detection
  • Image classification
  • Semantic segmentation

Concepts:

  • Automated hyperparameter tuning
  • Supervised, Unsupervised models, Reinforcement learning
  • Managed Spot Training
  • Deep Learning
    • Convolutional Neural Network (CNN), Recurrent Neural Networks (RNN)
    • Weights and biases
    • Activation functions
      • Softmax
      • Rectified Linear Unit (ReLu)
      • Tanh
    • Network layers (flatten layer, convolutional layer, pooling layer, output layer)
    • Dropout regularization
    • Model pruning
  • Solving overfitting and underfitting problems
  • Training SageMaker models on local mode
  • Early Stopping
  • Metrics for confusion matrix (true positives, false positives, false negatives, true negatives)
  • Model evaluation
    • ROC / AUC
    • F1 Score
    • Precision
    • Accuracy

Machine Learning Implementation and Operations

AWS Services

Concepts:

  • Real-time and batch inference
  • Monitoring model metrics using CloudWatch
  • Monitoring SageMaker API logs using CloudTrail
  • Using Amazon Augmented A2I to involve human-reviewers in a machine learning workflow.
  • Multi-model endpoints
  • AWS Exam Readiness Courses
  • Encrypting data with AWS KMS
  • Lifecycle configuration script
  • Optimizing model for edge-devices using SageMaker Neo

MLS-C01 Common Exam Scenarios

Scenario

Solution

MLS-C01 Domain 1: Data Engineering

A company wants to automatically convert streaming JSON data into Apache Parquet before storing them in an S3 bucket

Use Amazon Kinesis Firehose

A company uses Amazon EMR for its ETL processes. The company is looking for an alternative with a lower operational overhead

Run the ETL jobs using AWS Glue

Which service should you use to deliver streaming data from Amazon MSK to a Redshift cluster with low latency?

Redshift Streaming Ingestion

A data engineer is building a pipeline for streaming data. The data will be fetched from various sources.

Create an application that uses Kinesis Producer Library (KPL) to load streaming data from various sources into a Kinesis Data stream.

A company wants to set up a data lake on Amazon S3. The data will be sourced from S3 buckets located in different AWS accounts. Which service can simplify the implementation of the data lake?

AWS Lake Formation

MLS-C01 Domain 2: Exploratory Data Analysis

An image classifier is getting high accuracy on the validation dataset. However, the accuracy significantly dropped when tested against real data. How can you improve the model’s performance?

Take existing images from the training data. Apply data augmentation techniques (ex: flipping, rotating, adjusting brightness) to the images and add them to the training data. Retrain the model

What methods can a machine learning engineer use to reduce the size of a large dataset while retaining only relevant features?

1. Principal Component Analysis (PCA)

2. t-Distributed Stochastic Neighbor Embedding (t-SNE)

A dataset contains a mixture of categorical and numerical features. What feature engineering method should be done to prepare the data for training?

One-hot encoding

X and Y variables have a correlation coefficient of -0.98. What does it indicate?

Very strong negative correlation 

A machine learning engineer handles a small dataset with missing values. What should they do to ensure no data points are lost?

Use imputation techniques to fill in missing values

MLS-C01 Domain 3: Modeling

An ML engineer wants to evaluate the performance of a binary classification model visually. What visualization technique should be used?

Confusion matrix

An ML engineer wants to discover topics available within a large text dataset. Which algorithm should the engineer train the model on?

Latent Dirichlet Allocation (LDA) algorithm

A SageMaker Object2vec model is overfitting on a validation dataset. How do you solve this problem?

Use Regularization, in this case, adjusting the value of the Dropout parameter.

A neural network model is being trained using a large dataset in batches. As the training progresses, the loss function begins to oscillate. Which could be the cause?

The learning rate is too high

What SageMaker built-in algorithm is suitable for predicting click-through rate (CTR) patterns?

Factorization machines

MLS-C01 Domain 4: Machine Learning Implementation and Operations

An ML engineer wants to auto-scale the instances behind a SageMaker endpoint according to the volume of incoming requests. Which metric should this scaling be based on?

InvocationsPerInstance

Which AWS service can you use to convert audio formats into text?

Amazon Transcribe

An ML engineer is training a cluster of SageMaker instances. The traffic between the instances must be encrypted.

Enable inter-container traffic encryption

A company wants to use Amazon SageMaker to deploy various ML models in a cost-effective way.

Use multi-model endpoint

What AWS service can help you build an AI-powered chatbot that can interact with customers?

Amazon Lex

Validate Your Knowledge For Your MLS-C01 Exam

For high-quality practice exams, you can use our AWS Certified Machine Learning Specialty MLS-C01 Practice Exams. These practice tests will help you boost your preparedness for the real exam. It contains multiple sets of questions that cover almost every area that you can expect from the real certification exam. We have also included detailed explanations and adequate reference links to help you understand why the option with the correct answer is better than the rest of the options. This is the value that you will get from our course. Practice exams are a great way to determine which areas you are weak in, and they will also highlight the important information that you might have missed during your review.

 

Sample Practice Test Questions for MLS-C01:

Question 1

A trucking company wants to improve situational awareness for its operations team. Each truck has GPS devices installed to monitor their locations.

The company requires to have the data stored in Amazon Redshift to conduct near real-time analytics, which will then be used to generate updated dashboard reports.

Which workflow offers the quickest processing time from ingestion to storage?

  1. Use Amazon Kinesis Data Stream to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.
  2. Use Amazon Managed Streaming for Apache Kafka (MSK) to ingest the location data. Use Amazon Redshift Spectrum to deliver the data in the cluster.
  3. Use Amazon Kinesis Data Firehose to ingest the location data and set the Amazon Redshift cluster as the destination.
  4. Use Amazon Kinesis Data Firehose to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.

Correct Answer: 1

The Amazon Redshift Streaming ingestion feature makes it easier to access and analyze data coming from real-time data sources. It simplifies the streaming architecture by providing native integration between Amazon Redshift and the streaming engines in AWS, which are Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Streaming data sources like system logs, social media feeds, and IoT streams can continue to push events to the streaming engines, and Amazon Redshift simply becomes just another consumer.

Before, loading data from a stream into Amazon Redshift included several steps. These included connecting the stream to Amazon Kinesis Data Firehose and waiting for Kinesis Data Firehose to stage the data in Amazon S3, using various-sized batches at varying-length buffer intervals. After this, Kinesis Data Firehose initiated a COPY command to load the data from Amazon S3 to a table in Redshift. 

Amazon Redshift Streaming ingestion eliminates all of these extra steps, resulting in faster performance and improved latency.

Hence, the correct answer is: Use Amazon Kinesis Data Stream to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.

The option that says: Use Amazon Managed Streaming for Apache Kafka (MSK) to ingest the location data. Use Amazon Redshift Spectrum to deliver the data in the cluster is incorrect. Redshift Spectrum is a Redshift feature that allows you to query data in Amazon S3 without loading them into Redshift tables. Redshift Spectrum is not capable of moving data from S3 to Redshift.

The option that says: Use Amazon Kinesis Data Firehose to ingest the location data and set the Amazon Redshift cluster as the destination is incorrect. While you can configure Redshift as a destination for an Amazon Kinesis Data firehose, Kinesis does not actually load the data directly into Redsfhit. Under the hood, Kinesis stages the data first in Amazon S3 and copies it into Redshift using the COPY command.

The option that says Use Amazon Kinesis Data Firehose to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion is incorrect. Amazon Kinesis Data Firehose is not a valid streaming source for Amazon Redshift Streaming ingestion.

References:
https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html
https://aws.amazon.com/blogs/big-data/build-near-real-time-logistics-dashboards-using-amazon-redshift-and-amazon-managed-grafana-for-better-operational-intelligence/
https://aws.amazon.com/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/

Check out this Amazon Redshift Cheat Sheet:
https://tutorialsdojo.com/amazon-redshift/

Question 2

A Machine Learning Specialist is training an XGBoost-based model for detecting fraudulent transactions using Amazon SageMaker. The training data contains 5,000 fraudulent behaviors and 500,000 non-fraudulent behaviors. The model reaches an accuracy of 99.5% during training.

When tested on the validation dataset, the model shows an accuracy of 99.1% but delivers a high false-negative rate of 87.7%. The Specialist needs to bring down the number of false-negative predictions for the model to be acceptable in production.

Which combination of actions must be taken to meet the requirement? (Select TWO.)

  1. Increase the model complexity by specifying a larger value for the max_depth hyperparameter.
  2. Increase the value of the rate_drop hyperparameter to reduce the overfitting of the model.
  3. Adjust the balance of positive and negative weights by configuring the scale_pos_weight hyperparameter.
  4. Alter the value of the eval_metric hyperparameter to MAP (Mean Average Precision).
  5. Alter the value of the eval_metric hyperparameter to Area Under The Curve (AUC).

Correct Answer: 3, 5

Since the fraud detection model is a binary classifier, we should evaluate it using the Area Under the Curve metric. The AUC metric examines the ability of a binary classification model as its discrimination threshold is varied.

The scale_pos_weight hyperparameter allows you to fine-tune the threshold that matches your business need. In the scenario, the model has a high chance of outputting a high FNR (false-negative rate) due to a largely imbalanced dataset. You can fix that to reduce the predicted false-negatives by adjusting the scale_pos_weight.

Hence, the correct answers are:

– Alter the value of the eval_metric hyperparameter to Area Under The Curve (AUC) hyperparameter.

– Adjust the balance of positive and negative weights by configuring the scale_pos_weight hyperparameter.

The option that says: Increase the model complexity by specifying a larger value for the max_depth hyperparameter is incorrect. There’s no need to increase the model complexity because it already generalizes well on both the training and validation dataset.

The option that says: Increase the value of the rate_drop hyperparameter to reduce the overfitting of the model is incorrect because the training and validation accuracy is relatively good to be considered overfitting.

The option that says: Alter the value of the eval_metric hyperparameter to MAP (Mean Average Precision) is incorrect because this metric is only useful for evaluating ranking algorithms.

References:
https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html
https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst#learning-task-parameters

Click here for more AWS Certified Machine Learning Specialty practice exam questions.

Check out our other AWS practice test courses here:

 

Final Remarks

Machine Learning plays a major role in almost all industries. It provides numerous business benefits such as forecasting sales, predicting medical diagnosis, simplifying time-consuming data entry tasks, etc. With the proliferation of machine learning and AI applications, it’s not difficult to see how it will impact job demands in the market. The need for machine learning talent to build efficient and effective models at scale will definitely continue growing for years to come. And pairing your skills with the AWS Machine Learning — Specialty certification would absolutely make your resume stand out and boost your earning potential.

We hope that our guide has helped you achieve that goal, and we would love to hear back from your exam. We wish you the best of results.

Tutorials Dojo portal

Be Inspired and Mentored with Cloud Career Journeys!

Tutorials Dojo portal

Enroll Now – Our Azure Certification Exam Reviewers

azure reviewers tutorials dojo

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS Exam Readiness Digital Courses

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Recent Posts

Written by: Jon Bonso

Jon Bonso is the co-founder of Tutorials Dojo, an EdTech startup and an AWS Digital Training Partner that provides high-quality educational materials in the cloud computing space. He graduated from Mapúa Institute of Technology in 2007 with a bachelor's degree in Information Technology. Jon holds 10 AWS Certifications and is also an active AWS Community Builder since 2020.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?