Amazon Polly Cheat Sheet

Last updated on November 25, 2025

Bookmarks

Common use cases
Concepts
Features
Pricing
Validate Your Knowledge

Amazon Polly Cheat Sheet

A text-to-speech (TTS) service
Uses advanced deep learning technologies to convert text into natural, lifelike speech
It supports saving text into MP3, OGG, and PCM file formats.
Offers Standard and Neural TTS (NTTS)

Common Use Cases

Increase customer engagement
Language learning applications
Helps visually impaired individuals to consume digital content

Testing in-game dialogs
Voice response

Concepts

Speech Synthesis Markup Language (SSML)
- Uses XML-based tags to modify different aspects of the text-to-speech output.
- Can control pitch, speaking style, speech rate, and volume.
Standard TTS
- Concatenates short speech snippets together.
- Limited in terms of producing different speaking styles.
Neural TTS
- Produces higher quality speech output than Standard TTS.
- Neural TTS supports two speaking styles:
  - Conversational
  - Newscaster
Speech Mark
- Refers to the metadata that describes the synthesized speech
- Speech Mark has four types:
  - Sentence
  - Word
  - Viseme
  - SSML

Features

Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
Pronounces out abbreviations and acronyms
Interprets date/time and unit of measurements.
Homograph disambiguation
- For example, “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
Custom lexicon
- Supports customizing the pronunciation of words uncommon to the selected language.
Speech Synthesis Markup Language (SSML) additions:
- emphasis to stress words.
- break to insert pauses of custom lengths.
- prosody for adjusting pitch, rate, and volume dynamically.
- say-as to specify interpretation (dates, times, numbers, phone numbers).
Speech Marks (Metadata) additions:
- Can be used for lip-syncing in animations (especially viseme).
- Useful for highlighting text as spoken in apps.
Generative (Gen) Voice Engine
- Beyond Standard, Neural (NTTS), and Long‑Form, Amazon Polly now supports a Generative voice engine.
- The generative engine is more expressive, with context-dependent prosody (intonation, pausing, etc.).
- New generative voices were added: as of August 2025, seven new expressive voices (e.g., US English Salli, Canadian French Liam, Polish Ola/Ewa, etc.).
- In November 2025, five more generative voices were launched (Austrian German Hannah, Irish English Niamh, Brazilian Portuguese Camila, Belgian Dutch Lisa, Korean Seoyeon) plus expansion into new AWS regions (Seoul, Singapore, Tokyo).
Voice Persona Across Languages (“Polyglot / Multilingual Identity”)
- Polly supports polyglot voices, meaning a single voice persona (same “speaker identity”) can speak in multiple languages.
- Example: Matthew (US English) voice identity is used in other locales (Pedro in US Spanish, Daniel in German, Liam in Canadian French, Andrés in Mexican Spanish, Sergio in European Spanish, Rémi in French).
- This is very useful for brand consistency across regions.
Additional Regional / Voice Support
- New AWS Region for Neural voices: Europe (Zurich).
- Polly now supports Asia Pacific (Malaysia) region for both Neural and Standard voices.
- New neural voice: Korean Jihye.
- New neural English (Singapore) voice: Jasmine.
Brand Voice
- Brand Voice (custom neural voice) is still supported: organizations can work with AWS to build a unique voice persona exclusive to them.
- This can help distinguish your brand with a vocal identity in IVR, contact centers, or other applications.

Standard vs Neural TTS

Neural TTS updates:
- Supports more expressive emotions, not just Conversational or Newscaster.
- Can produce multiple voices per language, including brand voices (custom neural voices).
Standard TTS limits:
- Cannot produce multiple speaking styles simultaneously.
- Less natural prosody (intonation and rhythm).

Amazon Polly Pricing

Standard TTS
- $4.00 per 1 million characters
Neural TTS
- $16.00 per 1 million characters
Include free tier: 5 million characters per month for the first 12 months (Standard TTS).
Mention additional cost considerations:
- Charges are per character, not per request.
- Neural voices are 4x more expensive than standard voices.

Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.

Validate Your Knowledge

Question 1

A Business Process Outsourcing (BPO) company uses Amazon Polly to translate plaintext documents to speech for its voice response system. After testing, some acronyms and business-specific terms are being pronounced incorrectly.

Which approach will fix this issue?

Use a viseme Speech Mark.
Use pronunciation lexicons.
Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag.
Convert the scripts into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation.

Show me the answer!

Correct Answer: 2

With Amazon Polly’s custom lexicons or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms (e.g., “ROTFL”, “C’est la vie” when spoken in a non-French voice). To customize these pronunciations, you upload an XML file with lexical entries. For example, you can customize the pronunciation of the Filipino word: “Pilipinas” by using the phoneme element in your input XML.

Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides API operations that you can use to store lexicons in an AWS region. Those lexicons are then specific to that particular region.

The following are examples of ways to use lexicons with speech synthesis engines:

– Common words are sometimes stylized with numbers taking the place of letters, as with “g3t sm4rt” (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS) engine reads the text literally, pronouncing the name exactly as it is spelled. This is where you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this example, you can specify an alias (get smart) for the word “g3t sm4rt” in the lexicon.

– Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the word W3C so that it is read in the full, expanded form (World Wide Web Consortium).

Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the selected language. For example, you can specify the pronunciation using a phonetic alphabet.

Hence, the correct answer is: Use pronunciation lexicons.

The option that says: Use a viseme Speech Mark is incorrect as this feature is just used to synchronize speech with facial animation (lip-syncing) or to highlight written words as they’re spoken.

The option that says: Convert the scripts into Speech Synthesis Markup Language (SSML) and use the pronunciation tag is incorrect because Amazon Polly does not support this SSML tag.

The option that says: Convert the documents into Speech Synthesis Markup Language (SSML) and use the emphasis tag to guide the pronunciation is incorrect as this type of tag is simply used to emphasize words by changing the speaking rate and volume of the speech.

References:
https://docs.aws.amazon.com/polly/latest/dg/managing-lexicons.html
https://aws.amazon.com/blogs/machine-learning/create-accessible-training-with-initiafy-and-amazon-polly/

Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:

Amazon Polly Cheat Sheet References:

https://aws.amazon.com/polly/faqs/
https://aws.amazon.com/polly/features/
https://docs.aws.amazon.com/polly/latest/dg/how-text-to-speech-works.html
https://aws.amazon.com/polly/pricing/

Written by: Carlo Acebedo

Carlo is a cloud engineer and a content creator at Tutorials Dojo. He's also a member of the AWS Community builder and holds 5 AWS Certifications. Carlo specializes in building and automating solutions in the Amazon Web Services Cloud.

Amazon Polly

Amazon Polly

Amazon Polly Cheat Sheet

Common Use Cases

Concepts

Features

Standard vs Neural TTS

Amazon Polly Pricing

Validate Your Knowledge

Question 1

Show me the answer!

Amazon Polly Cheat Sheet References:

💝 Valentine’s Sale! Get 30% OFF Any Reviewer. Use coupon code: VDAYSALE2026 & 5% OFF Store Credits/Gift Cards

Learn AWS with our PlayCloud Hands-On Labs

$2.99 AWS and Azure Exam Study Guide eBooks

New AWS Generative AI Developer Professional Course AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Serverless Security

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Written by: Carlo Acebedo

Our Community

What our students say about us?

Amazon Polly

Amazon Polly

Amazon Polly Cheat Sheet

Common Use Cases

Concepts

Features

Standard vs Neural TTS

Amazon Polly Pricing

Validate Your Knowledge

Question 1

Show me the answer!

Amazon Polly Cheat Sheet References:

💝 Valentine’s Sale! Get 30% OFF Any Reviewer. Use coupon code: VDAYSALE2026 & 5% OFF Store Credits/Gift Cards

Learn AWS with our PlayCloud Hands-On Labs

$2.99 AWS and Azure Exam Study Guide eBooks

New AWS Generative AI Developer Professional Course AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Serverless Security

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Written by: Carlo Acebedo

Our Community

What our students say about us?

Did you find our content helpful?