Amazon Polly

Amazon Polly

Last updated on June 23, 2023

Amazon Polly Cheat Sheet

  • A text-to-speech (TTS) service
  • Uses advanced deep learning technologies to convert text into natural, lifelike speech
  • It supports saving text into MP3, OGG, and PCM file formats.
  • Offers Standard and Neural TTS (NTTS)
Tutorials dojo strip

Common Use Cases

  • Increase customer engagement
  • Language learning applications
  • Helps visually impaired individuals to consume digital content
  • Testing in-game dialogs
  • Voice response

Concepts

  • Speech Synthesis Markup Language (SSML)
    • Uses XML-based tags to modify different aspects of the text-to-speech output.
    • Can control pitch, speaking style, speech rate, and volume.
  • Standard TTS
    • Concatenates short speech snippets together.
    • Limited in terms of producing different speaking styles.
  • Neural TTS
    • Produces higher quality speech output than Standard TTS.
    • Neural TTS supports two speaking styles:
      • Conversational
      • Newscaster
  • Speech Mark
    • Refers to the metadata that describes the synthesized speech
    • Speech Mark has four types:
      • Sentence
      • Word
      • Viseme
      • SSML

Features

  • Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
  • Pronounces out abbreviations and acronyms
  • Interprets date/time and unit of measurements.
  • Homograph disambiguation 
    • For example,  “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
  • Custom lexicon
    • Supports customizing the pronunciation of words uncommon to the selected language.

Amazon Polly Pricing

  • Standard TTS
    • $4.00 per 1 million characters
  • Neural TTS
    • $16.00 per 1 million characters

Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.

AWS Certified Machine Learning Specialty Practice Exams

Validate Your Knowledge

Question 1

A Business Process Outsourcing (BPO) company uses Amazon Polly to translate plaintext documents to speech for its voice response system. After testing, some acronyms and business-specific terms are being pronounced incorrectly.

Which approach will fix this issue?

  1. Use a viseme Speech Mark.
  2. Use pronunciation lexicons.
  3. Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag.
  4. Convert the scripts into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation.

Correct Answer: 2

With Amazon Polly’s custom lexicons or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms (e.g., “ROTFL”, “C’est la vie” when spoken in a non-French voice). To customize these pronunciations, you upload an XML file with lexical entries. For example, you can customize the pronunciation of the Filipino word: “Pilipinas” by using the phoneme element in your input XML.

Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides API operations that you can use to store lexicons in an AWS region. Those lexicons are then specific to that particular region. 

The following are examples of ways to use lexicons with speech synthesis engines:

– Common words are sometimes stylized with numbers taking the place of letters, as with “g3t sm4rt” (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS) engine reads the text literally, pronouncing the name exactly as it is spelled. This is where you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this example, you can specify an alias (get smart) for the word “g3t sm4rt” in the lexicon.

– Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the word W3C so that it is read in the full, expanded form (World Wide Web Consortium).

Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the selected language. For example, you can specify the pronunciation using a phonetic alphabet. 

Hence, the correct answer is: Use pronunciation lexicons.

The option that says: Use a viseme Speech Mark is incorrect as this feature is just used to synchronize speech with facial animation (lip-syncing) or to highlight written words as they’re spoken.

The option that says: Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag is incorrect because Amazon Polly does not support this SSML tag.

The option that says: Convert the documents into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation is incorrect as this type of tag is simply used to emphasize words by changing the speaking rate and volume of the speech.

References:
https://docs.aws.amazon.com/polly/latest/dg/managing-lexicons.html
https://aws.amazon.com/blogs/machine-learning/create-accessible-training-with-initiafy-and-amazon-polly/

Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:

Tutorials Dojo AWS Practice Tests

Amazon Polly Cheat Sheet References:

https://aws.amazon.com/polly/faqs/
https://aws.amazon.com/polly/features/
https://docs.aws.amazon.com/polly/latest/dg/how-text-to-speech-works.html
https://aws.amazon.com/polly/pricing/

Tutorials Dojo portal

Be Inspired and Mentored with Cloud Career Journeys!

Tutorials Dojo portal

Enroll Now – Our Azure Certification Exam Reviewers

azure reviewers tutorials dojo

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS Exam Readiness Digital Courses

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Recent Posts

Written by: Carlo Acebedo

Carlo is a cloud engineer and a content creator at Tutorials Dojo. He's also a member of the AWS Community builder and holds 5 AWS Certifications. Carlo specializes in building and automating solutions in the Amazon Web Services Cloud.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?