AWS Certified Machine Learning Specialty MLS-C01 Sample Exam Questions

Home » Others » AWS Certified Machine Learning Specialty MLS-C01 Sample Exam Questions

AWS Certified Machine Learning Specialty MLS-C01 Sample Exam Questions

Last updated on January 11, 2025

Here are 10 AWS Certified Machine Learning Specialty MLS-C01 practice exam questions to help you gauge your readiness for the actual exam.

Question 1

A trucking company wants to improve situational awareness for its operations team. Each truck has GPS devices installed to monitor their locations.

The company requires to have the data stored in Amazon Redshift to conduct near real-time analytics, which will then be used to generate updated dashboard reports.

Which workflow offers the quickest processing time from ingestion to storage?

  1. Use Amazon Kinesis Data Stream to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.
  2. Use Amazon Managed Streaming for Apache Kafka (MSK) to ingest the location data. Use Amazon Redshift Spectrum to deliver the data in the cluster.
  3. Use Amazon Data Firehose to ingest the location data and set the Amazon Redshift cluster as the destination.
  4. Use Amazon Data Firehose to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.

Correct Answer: 1

The Amazon Redshift Streaming ingestion feature makes it easier to access and analyze data coming from real-time data sources. It simplifies the streaming architecture by providing native integration between Amazon Redshift and the streaming engines in AWS, which are Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Streaming data sources like system logs, social media feeds, and IoT streams can continue to push events to the streaming engines, and Amazon Redshift simply becomes just another consumer.

Before, loading data from a stream into Amazon Redshift included several steps. These included connecting the stream to Amazon Data Firehose and waiting for Data Firehose to stage the data in Amazon S3, using various-sized batches at varying-length buffer intervals. After this, Data Firehose initiated a COPY command to load the data from Amazon S3 to a table in Redshift.

Amazon Redshift Streaming ingestion eliminates all of these extra steps, resulting in faster performance and improved latency.

Hence, the correct answer is: Use Amazon Kinesis Data Stream to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion.

The option that says: Use Amazon Managed Streaming for Apache Kafka (MSK) to ingest the location data. Use Amazon Redshift Spectrum to deliver the data in the cluster is incorrect. Redshift Spectrum is a Redshift feature that allows you to query data in Amazon S3 without loading them into Redshift tables. Redshift Spectrum is not capable of moving data from S3 to Redshift.

The option that says: Use Amazon Data Firehose to ingest the location data and set the Amazon Redshift cluster as the destination is incorrect. While you can configure Redshift as a destination for an Amazon Data firehose, Kinesis does not actually load the data directly into Redsfhit. Under the hood, Kinesis stages the data first in Amazon S3 and copies it into Redshift using the COPY command.

The option that says Use Amazon Data Firehose to ingest the location data. Load the streaming data into the cluster using Amazon Redshift Streaming ingestion is incorrect. Amazon Data Firehose is not a valid streaming source for Amazon Redshift Streaming ingestion.

References:
https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html
https://aws.amazon.com/blogs/big-data/build-near-real-time-logistics-dashboards-using-amazon-redshift-and-amazon-managed-grafana-for-better-operational-intelligence/
https://aws.amazon.com/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/

Check out this Amazon Redshift Cheat Sheet:
https://tutorialsdojo.com/amazon-redshift/

Question 2

A Machine Learning Specialist is training an XGBoost-based model for detecting fraudulent transactions using Amazon SageMaker AI. The training data contains 5,000 fraudulent behaviors and 500,000 non-fraudulent behaviors. The model reaches an accuracy of 99.5% during training.

When tested on the validation dataset, the model shows an accuracy of 99.1% but delivers a high false-negative rate of 87.7%. The Specialist needs to bring down the number of false-negative predictions for the model to be acceptable in production.

Which combination of actions must be taken to meet the requirement? (Select TWO.)

  1. Increase the model complexity by specifying a larger value for the max_depth hyperparameter.
  2. Increase the value of the rate_drop hyperparameter to reduce the overfitting of the model.
  3. Adjust the balance of positive and negative weights by configuring the scale_pos_weight hyperparameter.
  4. Alter the value of the eval_metric hyperparameter to MAP (Mean Average Precision).
  5. Alter the value of the eval_metric hyperparameter to Area Under The Curve (AUC).

Correct Answer: 3,5

Since the fraud detection model is a binary classifier, we should evaluate it using the Area Under the Curve metric. The AUC metric examines the ability of a binary classification model as its discrimination threshold is varied.

ML model

The scale_pos_weight hyperparameter allows you to fine-tune the threshold that matches your business need. In the scenario, the model has a high chance of outputting a high FNR (false-negative rate) due to a largely imbalanced dataset. You can fix that to reduce the predicted false-negatives by adjusting the scale_pos_weight.

Hence, the correct answers are:

– Alter the value of the eval_metric hyperparameter to Area Under The Curve (AUC) hyperparameter.

– Adjust the balance of positive and negative weights by configuring the scale_pos_weight hyperparameter.

The option that says: Increase the model complexity by specifying a larger value for the max_depth hyperparameter is incorrect. There’s simply no need to increase the model complexity because it already generalizes well on both the training and validation dataset.

The option that says: Increase the value of the rate_drop hyperparameter to reduce the overfitting of the model is incorrect because the training and validation accuracy is relatively good to be considered overfitting.

The option that says: Alter the value of the eval_metric hyperparameter to MAP (Mean Average Precision) is incorrect because this metric is only useful for evaluating ranking algorithms.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html
https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst#learning-task-parameters

Tutorials dojo strip

Check out this Amazon SageMaker AI Cheat Sheet:
https://tutorialsdojo.com/amazon-sagemaker/

Question 3

A manufacturing company wants to aggregate data in Amazon S3 and analyze it using Amazon Athena. The company needs a solution that can both ingest and transform streaming data into Apache Parquet format.

Which AWS Service meets the requirements?

  1. Amazon Data Streams
  2. AWS Batch
  3. Amazon Data Firehose
  4. AWS Database Migration Service

Correct Answer: 3

Amazon Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon OpenSearch Service, Splunk, and any custom HTTP endpoint or HTTP endpoints owned by supported third-party service providers, including Datadog, MongoDB, and New Relic.

Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations. You can enable Data Firehose data transformation when you create your delivery stream. When you enable Data Firehose data transformation, Data Firehose buffers incoming data up to 3 MB by default. (To adjust the buffering size, use the  ProcessingConfiguration API with the ProcessorParameter called BufferSizeInMBs.)

Data Firehose then invokes the specified Lambda function asynchronously with each buffered batch using the AWS Lambda synchronous invocation model. The transformed data is sent from Lambda to Data Firehose. Data Firehose then sends it to the destination when the specified destination buffering size or buffering interval is reached, whichever happens first.

Hence, the correct answer is: Amazon Data Firehose.

Amazon Kinesis Data Streams is incorrect. Although you can ingest streaming data with Kinesis Data Streams, the question is specifically asking for an AWS service that can do both data ingestion and data transformation. Kinesis Data Streams can’t be used to transform data on the fly and store the output data to Amazon S3.

AWS Batch is incorrect because this service can’t be used for ingesting streaming data. AWS Batch is simply a service used to efficiently manage the necessary compute resources for batch processing jobs.

AWS Database Migration Service is incorrect because this service is just used to simplify the migration of different database platforms.

References:
https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html#data-transformation-flow
https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
https://aws.amazon.com/blogs/big-data/stream-data-to-an-http-endpoint-with-amazon-kinesis-data-firehose/

Check out this Amazon Kinesis Cheat Sheet:
https://tutorialsdojo.com/amazon-kinesis/

Question 4

A Data Scientist uses an Amazon Data Firehose stream to ingest data records produced from an on-premises application. These records are compressed using GZIP compression. The Scientist wants to perform SQL queries against the data stream to gain real-time insights.

Which configuration will enable querying with the LEAST latency?

  1. Transform the data with Amazon Kinesis Client Library and deliver the results to an Amazon OpenSearch cluster.
  2. Use a Kinesis Data Analytics application configured with AWS Lambda to transform the data.
  3. Use a streaming ETL job in AWS Glue to transform the data coming from the Firehose stream.
  4. Store the data records in an Amazon S3 bucket and use Amazon Athena to run queries.

Correct Answer: 2

You can configure your Amazon Kinesis Analytics applications to transform data before it is processed by your SQL code. This new feature allows you to use AWS Lambda to convert formats, enrich data, filter data, and more. Once the data is transformed by your function, Kinesis Analytics sends the data to your application’s SQL code for real-time analytics.

Kinesis Analytics provides Lambda blueprints for common use cases like converting GZIP and Kinesis Producer Library formats to JSON. You can use these blueprints without any change or write your own custom functions.

Kinesis Analytics continuously reads data from your Kinesis stream or Firehose stream. For each batch of records that it retrieves, the Lambda processor subsystem manages how each batch gets passed to your Lambda function. Your function receives a list of records as input. Within your function, you iterate through the list and apply your business logic to accomplish your preprocessing requirements (such as data transformation).

Hence, the correct answer is: Use a Kinesis Data Analytics application configured with AWS Lambda to transform the data.

The option that says: Transform the data with Amazon Kinesis Client Library and deliver the results to an Amazon OpenSearch cluster is incorrect. The Amazon Kinesis Client Library (KCL) is used to build applications that process data from Amazon Kinesis Data Streams. It is not designed for querying data streams directly or delivering data to an Amazon OpenSearch cluster. Additionally, Amazon OpenSearch is a search and analytics engine, not a real-time querying solution for data streams.

The option that says: Use a streaming ETL job in AWS Glue to transform the data coming from the Firehose stream is incorrect. AWS Glue is a serverless data integration service primarily used for batch ETL (Extract, Transform, Load) jobs. While it supports streaming ETL jobs, these jobs are designed for data transformation and loading into data stores like Amazon S3 or Amazon Redshift, not for real-time querying of data streams.

The option that says: Store the data records in an Amazon S3 bucket and use Amazon Athena to run queries is incorrect. Although this is technically a valid solution, it does not provide real-time insights, unlike Kinesis Data Analytics. Amazon Athena is simply an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is also not capable of consuming data directly from the Firehose stream in real-time.

 

References:

https://aws.amazon.com/about-aws/whats-new/2017/10/amazon-kinesis-analytics-can-now-pre-process-data-prior-to-running-sql-queries/

https://aws.amazon.com/blogs/big-data/preprocessing-data-in-amazon-kinesis-analytics-with-aws-lambda/

 

Check out these Amazon Kinesis Cheat Sheet:

https://tutorialsdojo.com/amazon-kinesis/

Question 5

A financial company is receiving hundreds of credit card applications daily and is looking for ways to streamline its manual review process. The company’s machine learning (ML) specialist has been given a CSV dataset with a highly imbalanced class.

The specialist must train a prototype classifier that predicts whether to approve or reject an application. The company wants the model to be delivered as soon as possible.

How can the ML specialist meet the requirement with the LEAST operational overhead?

  1. Upload the dataset to an Amazon S3 bucket. Create an Amazon SageMaker AutoPilot job and specify the bucket location as the source for the job. Choose the best version of the model.
  2. Upload the dataset to an Amazon S3 bucket. Use the built-in XGBoost algorithm in Amazon SageMaker to train the model. Run an automatic model tuning job with early stopping enabled. Select the best version of the model.
  3. Upload the dataset to an Amazon S3 bucket. Perform feature engineering on the data using Amazon SageMaker Data Wrangler. Train the model using the built-in XGBoost algorithm in Amazon SageMaker.
  4. Upload the dataset to an Amazon S3 bucket. Create an Amazon SageMaker Ground Truth labeling job. Select Text Classification (Single Label) as the task selection. Add the company’s credit officers as workers.

Correct Answer: 1

Amazon SageMaker Autopilot is a feature of Amazon SageMaker that allows you to automatically train and tune machine learning models with minimal setup and no machine learning expertise required.

Autopilot automatically generates pipelines, trains, and tunes the best ML models for classification or regression tasks on tabular data while allowing you to maintain full control and visibility. The autopilot automatically analyzes the dataset, processes the data into features, and trains multiple optimized ML models.

Hence, the correct answer is: Upload the dataset to an Amazon S3 bucket. Create an Amazon SageMaker AutoPilot job and specify the bucket location as the source for the job. Choose the best version of the model.

The option that says: Upload the dataset to an Amazon S3 bucket. Use the built-in XGBoost algorithm in Amazon SageMaker to train the model. Run an automatic model tuning job with early stopping enabled. Select the best version of the model is incorrect. This can be a possible solution, but Amazon SageMaker AutoPilot is still a better option for this scenario. AutoPilot not only automates the feature engineering, algorithm selection, and model tuning process but also has the ability to test multiple algorithms and multiple variations of each algorithm, then it will select the one that best fits the data.

The option that says: Upload the dataset to an Amazon S3 bucket. Perform feature engineering on the data using Amazon SageMaker Data Wrangler. Train the model using the built-in XGBoost algorithm in Amazon SageMaker is incorrect. Although this solution could work, it’s not the best solution for the scenario. With AutoPilot, you only need to provide the dataset and the target variable, and it will take care of the rest. It will also give you the best model with the least operational overhead compared to using Amazon SageMaker Data Wrangler and training the model with SageMaker’s XGBoost separately.

The option that says: Upload the dataset to an Amazon S3 bucket. Create an Amazon SageMaker Ground Truth labeling job. Select Text Classification (Single Label) as the task selection. Add the company’s credit officers as workers is incorrect. Ground Truth isn’t needed since the dataset provided by the financial company already has labeled data.

References: 
https://aws.amazon.com/blogs/machine-learning/creating-high-quality-machine-learning-models-for-financial-services-using-amazon-sagemaker-autopilot/
https://aws.amazon.com/blogs/aws/amazon-sagemaker-autopilot-fully-managed-automatic-machine-learning/

Check out this Amazon SageMaker Cheat Sheet:
https://tutorialsdojo.com/amazon-sagemaker/

Question 6

A Data Scientist launches an Amazon SageMaker notebook instance to develop a model for forecasting sales revenue. The scientist wants to load test the model to figure out the right instance size to deploy in production.

How can the scientist assess and visualize CPU utilization, GPU utilization, memory utilization, and latency as the load test runs?

  1. Create a CloudWatch dashboard to build a unified operational view of the metrics generated by the notebook instance.
  2. Create a custom CloudWatch Logs and stream the data into an Amazon OpenSearch cluster. Visualize the logs with Kibana.
  3. Create a log stream in CloudWatch Logs and subscribe to it an Amazon Data Firehose stream to send the data into an Amazon OpenSearch cluster. Visualize the logs with Kibana.
  4. Export the generated log data to an Amazon S3 bucket. Use Amazon Athena and Amazon QuickSight to visualize the SageMaker logs.

Correct Answer: 1

You can monitor Amazon SageMaker using Amazon CloudWatch, which collects raw data and processes it into readable, near-real-time metrics. These statistics are kept for 15 months so that you can access historical information and gain a better perspective on how your web application or service is performing.

You can then build customized dashboards for your CloudWatch metrics. Each dashboard can display multiple metrics and can be accessorized with text and images. You can build multiple dashboards if you’d like, each one focusing on providing a distinct view of your environment. You can even pull data from multiple regions into a single dashboard in order to create a global view.

Hence, the correct answer is: Create a CloudWatch dashboard to build a unified operational view of the metrics generated by the notebook instance.

While logs and metrics provide insights into how a service is operating, they are not the same. A metric represents a time-ordered set of data points that are published to CloudWatch. Think of a metric as a variable to monitor, and the data points represent the values of that variable over time. For example, the CPU usage of a particular EC2 instance is one metric provided by Amazon EC2. On the other hand, a log is generated data tied to a specific event that happens over time. The question is specifically asking to monitor the given metrics for the instance, hence, the following options are all incorrect:

– Create a custom CloudWatch Logs and stream the data into an Amazon OpenSearch cluster. Visualize the logs with Kibana.

– Export the generated log data to an Amazon S3 bucket. Use Amazon Athena and Amazon QuickSight to visualize the SageMaker logs.

– Create a log stream in CloudWatch Logs and subscribe to it an Amazon Data Firehose stream to send the data into an Amazon OpenSearch cluster. Visualize the logs with Kibana.

References:
https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.htmlhttps://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html

Check out this Amazon CloudWatch Cheat Sheet:
https://tutorialsdojo.com/amazon-cloudwatch/

Question 7

A Machine Learning Specialist has graphed the results of a K-means model fitted through a range of k-values. The Specialist needs to select the optimal k parameter.

Based on the graph, which k-value is the best choice?

 
  1. 4
  2. 9
  3. 3
  4. Free AWS Courses
  5. 6

Correct Answer: 1

The elbow method runs k-means clustering on the dataset for a range of values for k (e.g., 1-10), and then for each value of k computes an average score for all clusters. In this scenario, the distortion score is computed – the sum of square distances from each point to its assigned center.

When the resulting metrics for each model are plotted, it is possible to determine the best value for k visually. If the line chart looks like an “arm”m, then the “elbow” (the point of inflection on the curve) is the best value of k. The “arm” can be either up or down, but if there is a strong inflection point, it is a good indication that the underlying model fits best at that point.

Hence, the correct answer is 4.

References:
https://www.scikit-yb.org/en/latest/api/cluster/elbow.html
https://aws.amazon.com/blogs/machine-learning/k-means-clustering-with-amazon-sagemaker/
https://docs.aws.amazon.com/sagemaker/latest/dg/k-means.html

Question 8

A Machine Learning Specialist is migrating hundreds of thousands of records in CSV files into an Amazon S3 bucket. Each file has 150 columns and is about 1 MB in size. Most of the queries will span a minimum of 5 columns. The data must be transformed to minimize the query runtime.

Which transformation method will optimize query performance?

  1. Transform the files to XML data format.
  2. Transform the files to Apache Parquet data format.
  3. Transform the files to gzip-compressed CSV data format.
  4. Transform the files to JSON data format.

Correct Answer: 2

Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open-source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats. By compressing, partitioning, and using columnar formats, you can improve performance and reduce your costs.

Parquet and ORC file formats both support predicate pushdown (also called predicate filtering). Parquet and ORC both have blocks of data that represent column values. Each block holds statistics for the block, such as max/min values. When a query is being executed, these statistics determine whether the block should be read or skipped.

Athena charges you by the amount of data scanned per query. You can save on costs and get better performance if you partition the data, compress data, or convert it to columnar formats such as Apache Parquet. 

Apache Parquet is an open-source columnar storage format that is 2x faster to unload and takes up 6x less storage in Amazon S3 as compared to other text formats. One can COPY Apache Parquet and Apache ORC file formats from Amazon S3 to your Amazon Redshift cluster. Using AWS Glue, one can configure and run a job to transform CSV data to Parquet. Parquet is a columnar format that is well suited for AWS analytics services like Amazon Athena and Amazon Redshift Spectrum.

Hence, the correct answer is: Transform the files to Apache Parquet data format.

The option that says: Transform the files to gzip-compressed CSV data format is incorrect because Athena queries performed against row-based files are slower than columnar file formats like Apache Parquet.

The option that says: Transform the files to JSON data format is incorrect. Amazon Athena works best with columnar file formats like Apache Parquet.

The option that says: Transform the files to XML data format is incorrect because Athena can’t process XML data directly.

References:
https://aws.amazon.com/athena/faqs/
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/https://dzone.com/articles/how-to-be-a-hero-with-powerful-parquet-google-and

Check out these Amazon Athena Cheat Sheet:
https://tutorialsdojo.com/amazon-athena/

Question 9

A healthcare organization has a large repository of medical documents. The organization wants to categorize and manage these documents efficiently. The specific topics are yet to be determined, but the company aims to utilize the terms within each document to assign it to a relevant medical category. To solve this problem, a Machine Learning specialist uses Amazon SageMaker AI to develop a model.

Which built-in algorithm in Amazon SageMaker AI would be the most suitable choice?

  1. BlazingText algorithm in Text Classification mode
  2. Latent Dirichlet Allocation (LDA) Algorithm
  3. Semantic Segmentation Algorithm
  4. CatBoost Algorithm

Correct Answer: 2

The Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories. LDA is most commonly used to discover a user-specified number of topics shared by documents within a text corpus. Here each observation is a document, the features are the presence (or occurrence count) of each word, and the categories are the topics. Since the method is unsupervised, the topics are not specified up front and are not guaranteed to align with how a human may naturally categorize documents. The topics are learned as a probability distribution over the words that occur in each document. Each document, in turn, is described as a mixture of topics.

In the scenario, LDA would be appropriate as the exact topics for the medical documents are unknown, but the company aims to determine the topics based on the terms used in each document. With LDA, the algorithm can discover latent topics in the documents and assign each document to the most relevant topic, which can then be used for categorization and management.

Hence, the correct answer is: Latent Dirichlet Allocation (LDA) Algorithm

The option that says: BlazingText algorithm in Text Classification mode is incorrect. BlazingText algorithm in Text Classification mode is a supervised learning approach, which means it requires labeled training data to classify the text. In this scenario, the specific topics of the medical documents are unknown, making supervised learning methods unsuitable.

The option that says: Semantic Segmentation Algorithm is incorrect because this is a computer vision algorithm typically used for image and video processing. It may not be appropriate for this problem since the input is not in the form of images or videos but rather text documents.

The option that says: CatBoost Algorithm is incorrect. This is a gradient-boosting algorithm designed to handle categorical data. It is mainly used for prediction problems and may not be suitable for this problem, where the main objective is to categorize the documents based on the terms used within them.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/lda-how-it-works.htm
https://docs.aws.amazon.com/sagemaker/latest/dg/lda.html

Check out this Amazon SageMaker Cheat Sheet:
https://tutorialsdojo.com/amazon-sagemaker/

Question 10

A Business Process Outsourcing (BPO) company uses Amazon Polly to translate plaintext documents to speech for its voice response system. After testing, some acronyms and business-specific terms are being pronounced incorrectly.

Which approach will fix this issue?

  1. Use a viseme Speech Mark.
  2. Use pronunciation lexicons.
  3. Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag.
  4. Convert the scripts into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation.

Correct Answer: 2

With Amazon Polly’s custom lexicons or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms (e.g., “ROTFL”, “C’est la vie” when spoken in a non-French voice). To customize these pronunciations, you upload an XML file with lexical entries. For example, you can customize the pronunciation of the Filipino word: “Pilipinas” by using the phoneme element in your input XML.

Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides API operations that you can use to store lexicons in an AWS region. Those lexicons are then specific to that particular region. 

The following are examples of ways to use lexicons with speech synthesis engines:

– Common words are sometimes stylized with numbers taking the place of letters, as with “g3t sm4rt” (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS) engine reads the text literally, pronouncing the name exactly as it is spelled. This is where you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this example, you can specify an alias (get smart) for the word “g3t sm4rt” in the lexicon.

– Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the word W3C so that it is read in the full, expanded form (World Wide Web Consortium).

Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the selected language. For example, you can specify the pronunciation using a phonetic alphabet. 

Hence, the correct answer is: Use pronunciation lexicons.

The option that says: Use a viseme Speech Mark is incorrect as this feature is just used to synchronize speech with facial animation (lip-syncing) or to highlight written words as they’re spoken.

The option that says: Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag is incorrect because Amazon Polly does not support this SSML tag.

The option that says: Convert the documents into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation is incorrect as this type of tag is simply used to emphasize words by changing the speaking rate and volume of the speech.

References:
https://docs.aws.amazon.com/polly/latest/dg/managing-lexicons.html
https://aws.amazon.com/blogs/machine-learning/create-accessible-training-with-initiafy-and-amazon-polly/

Check out this AWS Polly Cheat Sheet:
https://tutorialsdojo.com/amazon-polly/

For more practice questions like these and to further prepare you for the actual AWS Certified Machine Learning Specialty MLS-C01 exam, we recommend that you take our top-notch AWS Certified Machine Learning Specialty Practice Exams, which have been regarded as the best in the market. 

Also, check out our AWS Certified Machine Learning Specialty MLS-C01 exam study guide here.

Tutorials Dojo portal

Level-Up Your Career this 2025

Learn AWS with our PlayCloud Hands-On Labs

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS Exam Readiness Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

Follow Us On Linkedin

Recent Posts

Written by: Jon Bonso

Jon Bonso is the co-founder of Tutorials Dojo, an EdTech startup and an AWS Digital Training Partner that provides high-quality educational materials in the cloud computing space. He graduated from Mapúa Institute of Technology in 2007 with a bachelor's degree in Information Technology. Jon holds 10 AWS Certifications and is also an active AWS Community Builder since 2020.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?