Amazon Polly Cheat Sheet

Last updated on November 14, 2024

Bookmarks

Common use cases
Concepts
Features
Pricing
Validate Your Knowledge

Amazon Polly Cheat Sheet

A text-to-speech (TTS) service
Uses advanced deep learning technologies to convert text into natural, lifelike speech
It supports saving text into MP3, OGG, and PCM file formats.
Offers Standard and Neural TTS (NTTS)

Common Use Cases

Increase customer engagement
Language learning applications
Helps visually impaired individuals to consume digital content
Testing in-game dialogs
Voice response

Concepts

Speech Synthesis Markup Language (SSML)
- Uses XML-based tags to modify different aspects of the text-to-speech output.
- Can control pitch, speaking style, speech rate, and volume.
Standard TTS
- Concatenates short speech snippets together.
- Limited in terms of producing different speaking styles.
Neural TTS
- Produces higher quality speech output than Standard TTS.
- Neural TTS supports two speaking styles:
  - Conversational
  - Newscaster
Speech Mark
- Refers to the metadata that describes the synthesized speech
- Speech Mark has four types:
  - Sentence
  - Word
  - Viseme
  - SSML

Features

Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
Pronounces out abbreviations and acronyms
Interprets date/time and unit of measurements.
Homograph disambiguation
- For example, “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
Custom lexicon
- Supports customizing the pronunciation of words uncommon to the selected language.

Amazon Polly Pricing

Standard TTS
- $4.00 per 1 million characters
Neural TTS
- $16.00 per 1 million characters

Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.

Validate Your Knowledge

Question 1

A Business Process Outsourcing (BPO) company uses Amazon Polly to translate plaintext documents to speech for its voice response system. After testing, some acronyms and business-specific terms are being pronounced incorrectly.

Which approach will fix this issue?

Use a viseme Speech Mark.
Use pronunciation lexicons.
Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag.
Convert the scripts into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation.

Show me the answer!

Correct Answer: 2

With Amazon Polly’s custom lexicons or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms (e.g., “ROTFL”, “C’est la vie” when spoken in a non-French voice). To customize these pronunciations, you upload an XML file with lexical entries. For example, you can customize the pronunciation of the Filipino word: “Pilipinas” by using the phoneme element in your input XML.

Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides API operations that you can use to store lexicons in an AWS region. Those lexicons are then specific to that particular region.

The following are examples of ways to use lexicons with speech synthesis engines:

– Common words are sometimes stylized with numbers taking the place of letters, as with “g3t sm4rt” (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS) engine reads the text literally, pronouncing the name exactly as it is spelled. This is where you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this example, you can specify an alias (get smart) for the word “g3t sm4rt” in the lexicon.

– Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the word W3C so that it is read in the full, expanded form (World Wide Web Consortium).

Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the selected language. For example, you can specify the pronunciation using a phonetic alphabet.

Hence, the correct answer is: Use pronunciation lexicons.

The option that says: Use a viseme Speech Mark is incorrect as this feature is just used to synchronize speech with facial animation (lip-syncing) or to highlight written words as they’re spoken.

The option that says: Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag is incorrect because Amazon Polly does not support this SSML tag.

The option that says: Convert the documents into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation is incorrect as this type of tag is simply used to emphasize words by changing the speaking rate and volume of the speech.

References:
https://docs.aws.amazon.com/polly/latest/dg/managing-lexicons.html
https://aws.amazon.com/blogs/machine-learning/create-accessible-training-with-initiafy-and-amazon-polly/

Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:

Amazon Polly Cheat Sheet References:

https://aws.amazon.com/polly/faqs/
https://aws.amazon.com/polly/features/
https://docs.aws.amazon.com/polly/latest/dg/how-text-to-speech-works.html
https://aws.amazon.com/polly/pricing/

Written by: Carlo Acebedo

Carlo is a cloud engineer and a content creator at Tutorials Dojo. He's also a member of the AWS Community builder and holds 5 AWS Certifications. Carlo specializes in building and automating solutions in the Amazon Web Services Cloud.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses