Last updated on November 14, 2024
Amazon Polly Cheat Sheet
- A text-to-speech (TTS) service
- Uses advanced deep learning technologies to convert text into natural, lifelike speech
- It supports saving text into MP3, OGG, and PCM file formats.
- Offers Standard and Neural TTS (NTTS)
Common Use Cases
- Increase customer engagement
- Language learning applications
- Helps visually impaired individuals to consume digital content
- Testing in-game dialogs
- Voice response
Concepts
- Speech Synthesis Markup Language (SSML)
- Uses XML-based tags to modify different aspects of the text-to-speech output.
- Can control pitch, speaking style, speech rate, and volume.
- Standard TTS
- Concatenates short speech snippets together.
- Limited in terms of producing different speaking styles.
- Neural TTS
- Produces higher quality speech output than Standard TTS.
- Neural TTS supports two speaking styles:
- Conversational
- Newscaster
- Speech Mark
- Refers to the metadata that describes the synthesized speech
- Speech Mark has four types:
- Sentence
- Word
- Viseme
- SSML
Features
- Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
- Pronounces out abbreviations and acronyms
- Interprets date/time and unit of measurements.
- Homograph disambiguation
- For example, “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
- Custom lexicon
- Supports customizing the pronunciation of words uncommon to the selected language.
Amazon Polly Pricing
- Standard TTS
- $4.00 per 1 million characters
- Neural TTS
- $16.00 per 1 million characters
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.
Validate Your Knowledge
Question 1
A Business Process Outsourcing (BPO) company uses Amazon Polly to translate plaintext documents to speech for its voice response system. After testing, some acronyms and business-specific terms are being pronounced incorrectly.
Which approach will fix this issue?
- Use a
viseme
Speech Mark. - Use pronunciation lexicons.
- Convert the scripts into Speech Synthesis Markup Language (SSML) and use the
pronunciation
tag. - Convert the scripts into Speech Synthesis Markup Language (SSML) and use the
emphasis
tag to guide the pronunciation.
For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:
Amazon Polly Cheat Sheet References:
https://aws.amazon.com/polly/faqs/
https://aws.amazon.com/polly/features/
https://docs.aws.amazon.com/polly/latest/dg/how-text-to-speech-works.html
https://aws.amazon.com/polly/pricing/