Ends in

SALE! Extra $2 OFF our Practice Test + eBook Bundles. Valid until May 12, 2021 6PM UTC+8

AWS Certified Machine Learning – Specialty Exam Study Path

The AWS Machine Learning — Specialty Certification is intended for individuals who are responsible for developing data science or applied machine learning projects on the AWS Cloud. This specialty certification is quite different from any other AWS exam. If you already have prior experience with other AWS certifications, you’re probably expecting to be heavily tested on AWS services and how they can be architected to build solutions that can solve different business problems. However, this is not the case in the ML-Specialty certification. Aside from Amazon SageMaker, most of the questions that you’ll encounter have nothing to do with AWS services at all. 

The exam covers a wide area of general machine learning concepts. One should at least have a high-level understanding of different stages in machine learning such as choosing the correct algorithm for a specific use case, data collection, feature engineering, test-train splitting, tuning, training, and deploying a model for inference. The exam also expects you to have knowledge on the common issues that arise from model training (e.g., overfitting, unbalanced dataset, missing values in the dataset) and the methods to fix them (e.g., regularization/early stopping, oversampling/adding noise to data, data imputation).

Machine Learning is more on math concepts rather than software engineering. Although not specifically required, it would be advantageous if you have a background in statistics or college math (Linear algebra, Differential calculus) to understand how an algorithm works behind the scenes. Also, It would be best to gain hands-on experience first by building simple models. This will allow you to learn quickly and get used to the jargon in machine learning.

We recommend checking out the following materials


  1. Machine Learning Terminology and Process
  2. Machine Learning Algorithms
  3. Math for Machine Learning
  4. AWS Foundations: Machine Learning Basics
  5. AWS Machine Learning Lens
  6. Machine Learning Best Practices in Financial Services
  7. Neural Networks
  8. Introduction to Artificial Intelligence
  9. Amazon Sagemaker

Other helpful materials

  1. AWS Machine Learning and AI Services Cheat Sheets
  2. Mike Chamber’s ML – Specialty Course
  3. Introduction to Machine Learning with Python
  4. Deep Learning with Python
  5. StatQuest


Data Engineering

AWS Services


  • Data ingestion techniques (Batch and Stream processing)
  • Data cleaning
  • ETL Pipeline
  • IT Certification Category (English)728x90
  • Building a data lake on Amazon S3
  • Available data storages for training with Amazon SageMaker
    • Amazon S3
    • Amazon EFS
    • Amazon FSx for Lustre
    • Amazon EBS
  • Amazon S3 lifecycle configuration
  • Amazon S3 data storage options

Exploratory Data Analysis

AWS Services


  • Data Cleaning
  • Data labeling (for supervised models)
  • Using RecordIO protobuf format to leverage SageMaker’s Pipe mode for training
  • Data Visualization and Analysis
    • Scatter plot
    • Box plots
    • Confusion matrix
  • Feature Engineering
    • Normalization
    • Scaling
    • Data imputation techniques for filling missing values
    • Oversampling/Undersampling methods to fix unbalanced dataset
    • Regularization
    • Dimensionality Reduction
      • Principal Component Analysis (PCA)
      • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • One-hot encoding
    • Label encoding
    • Binning
    • Test-train splitting with randomization


AWS Services

Amazon SageMaker built-in algorithms

  • Linear regression
  • Logistic regression
  • K-means clustering
  • Principal component analysis (PCA)
  • Factorization machines
  • Neural topic modeling
  • Latent Dirichlet allocation
  • XGBoost
  • Sequence-to-sequence
  • Time-series forecasting
  • BlazingText
  • Object detection
  • Image classification
  • Semantic segmentation


  • Automated hyperparameter tuning
  • Supervised, Unsupervised models, Reinforcement learning
  • Managed Spot Training
  • Deep Learning
    • Convolutional Neural Network (CNN), Recurrent Neural Networks (RNN)
    • Weights and biases
    • Activation functions
      • Softmax
      • Rectified Linear Unit (ReLu)
      • Tanh
    • Network layers (flatten layer, convolutional layer, pooling layer, output layer)
    • Dropout regularization
    • Model pruning
  • Solving overfitting and underfitting problems
  • Training SageMaker models on local mode
  • Early Stopping
  • Metrics for confusion matrix (true positives, false positives, false negatives, true negatives)
  • Model evaluation
    • ROC / AUC
    • F1 Score
    • Precision
    • Accuracy

Machine Learning Implementation and Operations

AWS Services


  • Real-time and batch inference
  • Monitoring model metrics using CloudWatch
  • Monitoring SageMaker API logs using CloudTrail
  • Using Amazon Augmented A2I to involve human-reviewers in a machine learning workflow.
  • Multi-model endpoints
  • Encrypting data with AWS KMS
  • Lifecycle configuration script
  • Tutorials Dojo Study Guide and Cheatsheet
  • Optimizing model for edge-devices using SageMaker Neo

Validate Your Knowledge

For high-quality practice exams, you can use our AWS Certified Machine Learning Specialty Practice Exams. These practice tests will help you boost your preparedness for the real exam. It contains multiple sets of questions that cover almost every area that you can expect from the real certification exam. We have also included detailed explanations and adequate reference links to help you understand why the option with the correct answer is better than the rest of the options. This is the value that you will get from our course. Practice exams are a great way to determine which areas you are weak in, and they will also highlight the important information that you might have missed during your review.

AWS Certified Machine Learning Specialty Practice Exams

Sample Practice Test Questions:

Question 1

A Machine Learning Specialist is training a regression model to predict house prices in different locations. The Specialist wants to test the quality of the test data by identifying whether the model is underestimating or overestimating the target price.

Which visualization technique should the Specialist use?

  1. Residual plots
  2. Confusion matrix
  3. Correlation matrix
  4. Root Mean Square Error (RMSE)

Correct Answer: 1

It is common practice to review the residuals for regression problems. A residual for an observation in the evaluation data is the difference between the true target and the predicted target. Residuals represent the portion of the target that the model is unable to predict. A positive residual indicates that the model is underestimating the target (the actual target is larger than the predicted target). A negative residual indicates an overestimation (the actual target is smaller than the predicted target).

The histogram of the residuals on the evaluation data, when distributed in a bell shape and centered at zero, indicates that the model makes mistakes in a random manner and does not systematically over or under predict any particular range of target values. If the residuals do not form a zero-centered bell shape, there is some structure in the model’s prediction error.

Hence, the correct answer is: Residual plots.

Confusion matrix is incorrect. This visualization technique is mainly used for evaluating the model’s performance. It won’t help you identify overestimation/underestimation.

Correlation matrix is incorrect. This visualization technique just shows the correlation coefficient between variables so you can have an idea about how close the predicted values are from true values. This won’t help you gain insight into the underestimation/overestimation of the target value.

Root Mean Square Error (RMSE) is incorrect because this is specifically used for measuring the accuracy of the ML model. RMSE is a distance measure between the predicted numeric target and the actual numeric answer (ground truth). The smaller the value of the RMSE, the better is the predictive accuracy of the model.


Question 2

A Machine Learning Specialist is preparing the dataset to be used for training a linear learner model in Amazon SageMaker. During exploratory data analysis, he has detected multiple feature columns that have missing values. The percentage of missing data across the whole training dataset is about 10%. The Specialist is worried that this might cause bias to his model that can lead to inaccurate results.

Which approach will MOST likely yield the best result in reducing the bias caused by missing values?

  1. Drop the columns that include missing values because they only account for 10% of the training data.
  2. Use supervised learning methods to estimate the missing values for each feature.
  3. Compute the mean of non-missing values in the same row and use the result to replace missing values.
  4. Compute the mean of non-missing values in the same column and use the result to replace missing values.

Correct Answer: 2

After getting to know your data through data summaries and visualizations, you might want to transform your variables further to make them more meaningful. This is known as feature processing.

One of the common feature processing is imputing missing values to replace missing values with the mean or median value. It is important to understand your data before choosing a strategy for replacing missing values.

While the abovementioned strategy is possible, using a supervised learning method to approximate missing values will most likely provide better results. Supervised learning applied to the imputation of missing values is an active field of research.

Hence, the correct answer is: Use supervised learning methods to estimate the missing values for each feature.

The option that says: Drop the columns that include missing values because they only account for 10% of the training data is incorrect. Doing this will remove features that might be valuable to your machine learning task. Hence, this is not the most effective method.

The following options are both valid imputation techniques but it’s not likely for them to give better estimates than a supervised learning method.

– Compute the mean of non-missing values in the same row and use the result to replace missing values.

– Compute the mean of non-missing values in the same column and use the result to replace missing values. 


Click here for more AWS Certified Machine Learning Specialty practice exam questions.

Check out our other AWS practice test courses here:Tutorials Dojo AWS Practice Tests

Final Remarks

Machine Learning plays a major role in almost all industries. It provides numerous business benefits such as forecasting sales, predicting medical diagnosis, simplifying time-consuming data entry tasks, etc. With the proliferation of machine learning and AI applications, it’s not difficult to see how it will impact job demands in the market. The need for machine learning talent to build efficient and effective models at scale will definitely continue growing for years to come. And pairing your skills with the AWS Machine Learning — Specialty certification would absolutely make your resume stand out and boost your earning potential.

We hope that our guide has helped you achieved that goal, and we would love to hear back from your exam. We wish you the best of results.

SALE! EXTRA HUGE Discounts on our Practice Test + eBook Bundles

Pass your AWS, Azure, and Google Cloud Certifications with the Tutorials Dojo Portal

Tutorials Dojo portal

Our Bestselling AWS Certified Solutions Architect Associate Practice Exams

AWS Certified Solutions Architect Associate Practice Exams

Enroll Now – Our AWS Practice Exams with 95% Passing Rate

AWS Practice Exams Tutorials Dojo

Enroll Now – Our Azure Certification Exam Reviewers

azure reviewers tutorials dojo

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

Tutorials Dojo Study Guide and Cheat Sheets-2

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Browse Other Courses

Generic Category (English)300x250

Recent Posts

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers

Our Community

passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?

error: Content is protected !!