Ends in
00
days
00
hrs
00
mins
00
secs
ENROLL NOW

🎊 70% OFF on our Black Friday Mega Sale with $1.99 eBooks and 100+ Free Courses

Amazon Polly

Amazon Polly

Last updated on November 25, 2025

Amazon Polly Cheat Sheet

  • A text-to-speech (TTS) service
  • Uses advanced deep learning technologies to convert text into natural, lifelike speech
  • It supports saving text into MP3, OGG, and PCM file formats.
  • Offers Standard and Neural TTS (NTTS)

Common Use Cases

  • Increase customer engagement
  • Language learning applications
  • Helps visually impaired individuals to consume digital content
  • Tutorials dojo strip
  • Testing in-game dialogs
  • Voice response

Concepts

  • Speech Synthesis Markup Language (SSML)
    • Uses XML-based tags to modify different aspects of the text-to-speech output.
    • Can control pitch, speaking style, speech rate, and volume.
  • Standard TTS
    • Concatenates short speech snippets together.
    • Limited in terms of producing different speaking styles.
  • Neural TTS
    • Produces higher quality speech output than Standard TTS.
    • Neural TTS supports two speaking styles:
      • Conversational
      • Newscaster
  • Speech Mark
    • Refers to the metadata that describes the synthesized speech
    • Speech Mark has four types:
      • Sentence
      • Word
      • Viseme
      • SSML

Features

  • Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
  • Pronounces out abbreviations and acronyms
  • Interprets date/time and unit of measurements.
  • Homograph disambiguation 
    • For example,  “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
  • Custom lexicon
    • Supports customizing the pronunciation of words uncommon to the selected language.
  • Speech Synthesis Markup Language (SSML) additions:
    • emphasis to stress words.
    • break to insert pauses of custom lengths.
    • prosody for adjusting pitch, rate, and volume dynamically.
    • say-as to specify interpretation (dates, times, numbers, phone numbers).
  • Speech Marks (Metadata) additions:
    • Can be used for lip-syncing in animations (especially viseme).
    • Useful for highlighting text as spoken in apps.
  • Generative (Gen) Voice Engine
    • Beyond Standard, Neural (NTTS), and Long‑Form, Amazon Polly now supports a Generative voice engine
    • The generative engine is more expressive, with context-dependent prosody (intonation, pausing, etc.).
    • New generative voices were added: as of August 2025, seven new expressive voices (e.g., US English Salli, Canadian French Liam, Polish Ola/Ewa, etc.).
    • In November 2025, five more generative voices were launched (Austrian German Hannah, Irish English Niamh, Brazilian Portuguese Camila, Belgian Dutch Lisa, Korean Seoyeon) plus expansion into new AWS regions (Seoul, Singapore, Tokyo).
  • Voice Persona Across Languages (“Polyglot / Multilingual Identity”)
    • Polly supports polyglot voices, meaning a single voice persona (same “speaker identity”) can speak in multiple languages.
    • Example: Matthew (US English) voice identity is used in other locales (Pedro in US Spanish, Daniel in German, Liam in Canadian French, Andrés in Mexican Spanish, Sergio in European Spanish, Rémi in French).
    • This is very useful for brand consistency across regions.
  • Additional Regional / Voice Support
    • New AWS Region for Neural voices: Europe (Zurich).
    • Polly now supports Asia Pacific (Malaysia) region for both Neural and Standard voices.
    • New neural voice: Korean Jihye.
    • New neural English (Singapore) voice: Jasmine.
  • Brand Voice
    • Brand Voice (custom neural voice) is still supported: organizations can work with AWS to build a unique voice persona exclusive to them.
    • This can help distinguish your brand with a vocal identity in IVR, contact centers, or other applications.

Standard vs Neural TTS

  • Neural TTS updates:
    • Supports more expressive emotions, not just Conversational or Newscaster.
    • Can produce multiple voices per language, including brand voices (custom neural voices).
  • Standard TTS limits:
    • Cannot produce multiple speaking styles simultaneously.
    • Less natural prosody (intonation and rhythm).

Amazon Polly Pricing

  • Standard TTS
    • $4.00 per 1 million characters
  • Neural TTS
    • $16.00 per 1 million characters
  • Include free tier: 5 million characters per month for the first 12 months (Standard TTS).
  • Mention additional cost considerations:
    • Charges are per character, not per request.
    • Neural voices are 4x more expensive than standard voices.

Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.

AWS Certified Machine Learning Specialty Practice Exams

Validate Your Knowledge

Question 1

A Business Process Outsourcing (BPO) company uses Amazon Polly to translate plaintext documents to speech for its voice response system. After testing, some acronyms and business-specific terms are being pronounced incorrectly.

Which approach will fix this issue?

  1. Use a viseme Speech Mark.
  2. Use pronunciation lexicons.
  3. Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag.
  4. Convert the scripts into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation.

Free AWS Courses

Correct Answer: 2

With Amazon Polly’s custom lexicons or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms (e.g., “ROTFL”, “C’est la vie” when spoken in a non-French voice). To customize these pronunciations, you upload an XML file with lexical entries. For example, you can customize the pronunciation of the Filipino word: “Pilipinas” by using the phoneme element in your input XML.

Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides API operations that you can use to store lexicons in an AWS region. Those lexicons are then specific to that particular region. 

The following are examples of ways to use lexicons with speech synthesis engines:

– Common words are sometimes stylized with numbers taking the place of letters, as with “g3t sm4rt” (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS) engine reads the text literally, pronouncing the name exactly as it is spelled. This is where you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this example, you can specify an alias (get smart) for the word “g3t sm4rt” in the lexicon.

– Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the word W3C so that it is read in the full, expanded form (World Wide Web Consortium).

Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the selected language. For example, you can specify the pronunciation using a phonetic alphabet. 

Hence, the correct answer is: Use pronunciation lexicons.

The option that says: Use a viseme Speech Mark is incorrect as this feature is just used to synchronize speech with facial animation (lip-syncing) or to highlight written words as they’re spoken.

The option that says: Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag is incorrect because Amazon Polly does not support this SSML tag.

The option that says: Convert the documents into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation is incorrect as this type of tag is simply used to emphasize words by changing the speaking rate and volume of the speech.

References:
https://docs.aws.amazon.com/polly/latest/dg/managing-lexicons.html
https://aws.amazon.com/blogs/machine-learning/create-accessible-training-with-initiafy-and-amazon-polly/

Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:

Tutorials Dojo AWS Practice Tests

Amazon Polly Cheat Sheet References:

https://aws.amazon.com/polly/faqs/
https://aws.amazon.com/polly/features/
https://docs.aws.amazon.com/polly/latest/dg/how-text-to-speech-works.html
https://aws.amazon.com/polly/pricing/

🎊 70% OFF on our Black Friday Mega Sale with $1.99 eBooks and 100+ Free Courses

Tutorials Dojo portal

Learn AWS with our PlayCloud Hands-On Labs

🧑‍💻 50% OFF – CodeQuest Coding Labs

$2.99 AWS and Azure Exam Study Guide eBooks

tutorials dojo study guide eBook

New AWS Generative AI Developer Professional Course AIP-C01

AIP-C01 Exam Guide AIP-C01 examtopics AWS Certified Generative AI Developer Professional Exam Domains AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

Follow Us On Linkedin

Written by: Carlo Acebedo

Carlo is a cloud engineer and a content creator at Tutorials Dojo. He's also a member of the AWS Community builder and holds 5 AWS Certifications. Carlo specializes in building and automating solutions in the Amazon Web Services Cloud.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?