Last updated on August 12, 2024
The AWS Machine Learning — Specialty MLS-C01 Certification is intended for individuals who are responsible for developing data science or applied machine learning projects on the AWS Cloud. This specialty certification is quite different from any other AWS exam. If you already have prior experience with other AWS certifications, you’re probably expecting to be heavily tested on AWS services and how they can be architected to build solutions that can solve different business problems. However, this is not the case in the ML-Specialty certification. Aside from Amazon SageMaker, most of the questions that you’ll encounter have nothing to do with AWS services at all.
The exam covers a wide area of general machine learning concepts. One should at least have a high-level understanding of different stages in machine learning such as choosing the correct algorithm for a specific use case, data collection, feature engineering, test-train splitting, tuning, training, and deploying a model for inference. The exam also expects you to have knowledge on the common issues that arise from model training (e.g., overfitting, unbalanced dataset, missing values in the dataset) and the methods to fix them (e.g., regularization/early stopping, oversampling/adding noise to data, data imputation).
Machine Learning is more on math concepts rather than software engineering. Although not specifically required, it would be advantageous if you have a background in statistics or college math (Linear algebra, Differential calculus) to understand how an algorithm works behind the scenes. Also, It would be best to gain hands-on experience first by building simple models. This will allow you to learn quickly and get used to the jargon in machine learning.
We recommend checking out the following materials
STUDY MATERIALS FOR THE MLS-C01 SPECIALTY EXAM
- Machine Learning Terminology and Process
- Machine Learning Algorithms
- Math for Machine Learning
- AWS Foundations: Machine Learning Basics
- AWS Machine Learning Lens
- Machine Learning Best Practices in Financial Services
- Neural Networks
- Introduction to Artificial Intelligence
- Amazon Sagemaker
We also recommend taking this free and highly interactive AWS Exam Readiness digital course for the AWS Certified Machine Learning Specialty MLS-C01 exam:
Other helpful materials
- AWS Machine Learning and AI Services Cheat Sheets
- Mike Chamber’s ML – Specialty Course
- Introduction to Machine Learning with Python
- Deep Learning with Python
- StatQuest
MLS-C01 RELATED AWS SERVICES TO FOCUS ON
Data Engineering
AWS Services
Concepts
- Data ingestion techniques (Batch and Stream processing)
- Data cleaning
- ETL Pipeline
- Building a data lake on Amazon S3
- Available data storages for training with Amazon SageMaker
- Amazon S3 lifecycle configuration
- Amazon S3 data storage options
Exploratory Data Analysis
AWS Services
Concepts
- Data Cleaning
- Data labeling (for supervised models)
- Using RecordIO protobuf format to leverage SageMaker’s Pipe mode for training
- Data Visualization and Analysis
- Scatter plot
- Box plots
- Confusion matrix
- Feature Engineering
- Normalization
- Scaling
- Data imputation techniques for filling missing values
- Oversampling/Undersampling methods to fix unbalanced dataset
- Regularization
- Dimensionality Reduction
- Principal Component Analysis (PCA)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- One-hot encoding
- Label encoding
- Binning
- Test-train splitting with randomization
Modeling
AWS Services
- Amazon SageMaker
- Amazon SageMaker Automatic Model Tuning
- Amazon SageMaker Python SDK
- Amazon Comprehend
- Amazon Rekognition
- Amazon Transcribe
- Amazon Polly
- Amazon Translate
- Amazon Lex
- AWS DeepLens
Amazon SageMaker built-in algorithms
- Linear regression
- Logistic regression
- K-means clustering
- Principal component analysis (PCA)
- Factorization machines
- Neural topic modeling
- Latent Dirichlet allocation
- XGBoost
- Sequence-to-sequence
- Time-series forecasting
- BlazingText
- Object detection
- Image classification
- Semantic segmentation
Concepts:
- Automated hyperparameter tuning
- Supervised, Unsupervised models, Reinforcement learning
- Managed Spot Training
- Deep Learning
- Convolutional Neural Network (CNN), Recurrent Neural Networks (RNN)
- Weights and biases
- Activation functions
- Softmax
- Rectified Linear Unit (ReLu)
- Tanh
- Network layers (flatten layer, convolutional layer, pooling layer, output layer)
- Dropout regularization
- Model pruning
- Solving overfitting and underfitting problems
- Training SageMaker models on local mode
- Early Stopping
- Metrics for confusion matrix (true positives, false positives, false negatives, true negatives)
- Model evaluation
- ROC / AUC
- F1 Score
- Precision
- Accuracy
Machine Learning Implementation and Operations
AWS Services
- Amazon Elastic Inference
- Amazon SageMaker Inference Pipeline
- Amazon SageMaker Neo
- Amazon Augmented AI (A2I)
- Amazon CloudWatch
- AWS CloudTrail
Concepts:
- Real-time and batch inference
- Monitoring model metrics using CloudWatch
- Monitoring SageMaker API logs using CloudTrail
- Using Amazon Augmented A2I to involve human-reviewers in a machine learning workflow.
- Multi-model endpoints
- Encrypting data with AWS KMS
- Lifecycle configuration script
- Optimizing model for edge-devices using SageMaker Neo
MLS-C01 Common Exam Scenarios
Scenario |
Solution |
MLS-C01 Domain 1: Data Engineering |
|
A company wants to automatically convert streaming JSON data into Apache Parquet before storing them in an S3 bucket |
Use Amazon Kinesis Firehose |
A company uses Amazon EMR for its ETL processes. The company is looking for an alternative with a lower operational overhead |
Run the ETL jobs using AWS Glue |
Which service should you use to deliver streaming data from Amazon MSK to a Redshift cluster with low latency? |
Redshift Streaming Ingestion |
A data engineer is building a pipeline for streaming data. The data will be fetched from various sources. |
Create an application that uses Kinesis Producer Library (KPL) to load streaming data from various sources into a Kinesis Data stream. |
A company wants to set up a data lake on Amazon S3. The data will be sourced from S3 buckets located in different AWS accounts. Which service can simplify the implementation of the data lake? |
AWS Lake Formation |
MLS-C01 Domain 2: Exploratory Data Analysis |
|
An image classifier is getting high accuracy on the validation dataset. However, the accuracy significantly dropped when tested against real data. How can you improve the model’s performance? |
Take existing images from the training data. Apply data augmentation techniques (ex: flipping, rotating, adjusting brightness) to the images and add them to the training data. Retrain the model |
What methods can a machine learning engineer use to reduce the size of a large dataset while retaining only relevant features? |
1. Principal Component Analysis (PCA) 2. t-Distributed Stochastic Neighbor Embedding (t-SNE) |
A dataset contains a mixture of categorical and numerical features. What feature engineering method should be done to prepare the data for training? |
One-hot encoding |
X and Y variables have a correlation coefficient of -0.98. What does it indicate? |
Very strong negative correlation |
A machine learning engineer handles a small dataset with missing values. What should they do to ensure no data points are lost? |
Use imputation techniques to fill in missing values |
MLS-C01 Domain 3: Modeling |
|
An ML engineer wants to evaluate the performance of a binary classification model visually. What visualization technique should be used? |
Confusion matrix |
An ML engineer wants to discover topics available within a large text dataset. Which algorithm should the engineer train the model on? |
Latent Dirichlet Allocation (LDA) algorithm |
A SageMaker Object2vec model is overfitting on a validation dataset. How do you solve this problem? |
Use Regularization, in this case, adjusting the value of the Dropout parameter. |
A neural network model is being trained using a large dataset in batches. As the training progresses, the loss function begins to oscillate. Which could be the cause? |
The learning rate is too high |
What SageMaker built-in algorithm is suitable for predicting click-through rate (CTR) patterns? |
Factorization machines |
MLS-C01 Domain 4: Machine Learning Implementation and Operations |
|
An ML engineer wants to auto-scale the instances behind a SageMaker endpoint according to the volume of incoming requests. Which metric should this scaling be based on? |
|
Which AWS service can you use to convert audio formats into text? |
Amazon Transcribe |
An ML engineer is training a cluster of SageMaker instances. The traffic between the instances must be encrypted. |
Enable inter-container traffic encryption |
A company wants to use Amazon SageMaker to deploy various ML models in a cost-effective way. |
Use multi-model endpoint |
What AWS service can help you build an AI-powered chatbot that can interact with customers? |
Amazon Lex |
Validate Your Knowledge For Your MLS-C01 Exam
For high-quality practice exams, you can use our AWS Certified Machine Learning Specialty MLS-C01 Practice Exams. These practice tests will help you boost your preparedness for the real exam. It contains multiple sets of questions that cover almost every area that you can expect from the real certification exam. We have also included detailed explanations and adequate reference links to help you understand why the option with the correct answer is better than the rest of the options. This is the value that you will get from our course. Practice exams are a great way to determine which areas you are weak in, and they will also highlight the important information that you might have missed during your review.
Sample Practice Test Questions for MLS-C01:
Question 1
A trucking company wants to improve situational awareness for its operations team. Each truck has GPS devices installed to monitor their locations.
The company requires to have the data stored in Amazon Redshift to conduct near real-time analytics, which will then be used to generate updated dashboard reports.
Which workflow offers the quickest processing time from ingestion to storage?
- Use Amazon Kinesis Data Stream to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.
- Use Amazon Managed Streaming for Apache Kafka (MSK) to ingest the location data. Use Amazon Redshift Spectrum to deliver the data in the cluster.
- Use Amazon Data Firehose to ingest the location data and set the Amazon Redshift cluster as the destination.
- Use Amazon Data Firehose to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.
Question 2
A Machine Learning Specialist is training an XGBoost-based model for detecting fraudulent transactions using Amazon SageMaker. The training data contains 5,000 fraudulent behaviors and 500,000 non-fraudulent behaviors. The model reaches an accuracy of 99.5% during training.
When tested on the validation dataset, the model shows an accuracy of 99.1% but delivers a high false-negative rate of 87.7%. The Specialist needs to bring down the number of false-negative predictions for the model to be acceptable in production.
Which combination of actions must be taken to meet the requirement? (Select TWO.)
- Increase the model complexity by specifying a larger value for the
max_depth
hyperparameter. - Increase the value of the
rate_drop
hyperparameter to reduce the overfitting of the model. - Adjust the balance of positive and negative weights by configuring the
scale_pos_weight
hyperparameter. - Alter the value of the
eval_metric
hyperparameter to MAP (Mean Average Precision). - Alter the value of the
eval_metric
hyperparameter to Area Under The Curve (AUC).
Click here for more AWS Certified Machine Learning Specialty practice exam questions.
Check out our other AWS practice test courses here:
Machine Learning plays a major role in almost all industries. It provides numerous business benefits such as forecasting sales, predicting medical diagnosis, simplifying time-consuming data entry tasks, etc. With the proliferation of machine learning and AI applications, it’s not difficult to see how it will impact job demands in the market. The need for machine learning talent to build efficient and effective models at scale will definitely continue growing for years to come. And pairing your skills with the AWS Machine Learning — Specialty certification would absolutely make your resume stand out and boost your earning potential.
We hope that our guide has helped you achieve that goal, and we would love to hear back from your exam. We wish you the best of results.