Amazon Comprehend

What is Amazon Comprehend?

  • A managed Natural Language Processing (NLP) service that you can use to extract meaningful information from unstructured texts so you can analyze them in a human-like context.
  • It is an off-the-shelf solution that does not require deep machine learning expertise to get started.
  • Works with social media feeds, web pages, comments, product reviews, articles, or emails.
  • Can analyze texts in real-time by using built-in and custom models.
  • Also offers medical insights and protected health information (PHI) detection via Amazon Comprehend Medical

Amazon Comprehend Common Use Cases

  • Sentiment analysis for social media posts
  • Organize documents by topics
  • Knowledge management and discovery
  • Classifies support tickets for better issue handling
  • Tutorials dojo strip
  • Medical cohort analysis
  • Identify personally identifiable information (PII) in documents.
  • Identify protected health information (PHI) in documents

Amazon Comprehend generates insights in six (6) categories:

  • Entities
    • Detects and categorizes real-world objects like date, organization, person, quantity, brands, or even a title given to a song or movie.
    • Custom Entity Recognition 
      • Allows you to identify new entities that are not supported by the preset entities. 
      • This is useful if you want to extract entities that are specific only to your business, such as product codes.
  • Sentiment
    • Detects and classifies emotions into neutral, positive, negative, or mixed.
  • Language
    • Detects the language used in a text by using identifiers from RFC 5646. 
    • Useful for multilingual companies or applications.
  • Key Phrases
    • A key phrase refers to a noun or a noun phrase that describes a particular thing.
  • Personally Identifiable Information (PII)
    • Determines sensitive information that could be used to identify a person, such as full name, birth date, bank account number, phone number, or email.
  • Syntax
    • Determine the different parts of speech used in the document, such as noun, pronoun, verb, adjective, adverb, etc.

Amazon Comprehend Concepts

  • Each insight is associated with a confidence score.
  • A confidence score is between 0 and 100, indicating the probability that a given prediction is correct.
  • A product review with a positive sentiment and a 0.99 confidence score highly suggest positive feedback from a customer.
  • Topic Modeling
    • Classifies a collection of documents according to its common subject.
    • For example, you can use Topic Modeling to categorize news articles into politics, sports, business, entertainment, etc. 
  • Comprehend custom
    • It helps non-experts in machine learning build and train their own NLP models suited to their specific needs.
    • Amazon Comprehend uses a machine learning method called transfer learning to train custom models.



Amazon Comprehend Medical

  • A type of Amazon Comprehend service that is focused on medical use cases
  • A natural language processing service that extracts relevant medical information from unstructured text.
  • Can quickly and accurately gather medical conditions, medication, dosage, strength, and frequency from a text file
  • Has the ability to recognize and analyze Protected Health Information (PHI) from text documents.

Amazon Comprehend Pricing

  • Charges are based on units where a single unit is equal to 100 characters. 
  • 3 unit (300 characters) minimum charge per request.
  • All insights except for Syntax analysis are charged for $0.0001 per 10M units. Syntax Analysis is charged for $0.00005 per 10M units.
  • Topic Modeling has a flat rate of $1.00 per job.

Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.

AWS Certified Machine Learning Specialty Practice Exam MLS-C01


Validate Your Knowledge

Question 1

A Machine Learning Specialist working for an e-commerce company is creating an application using Amazon Comprehend. The application will analyze sentiments for reviews about various electronic products. During development, he noticed that all device model names are labeled as Commercial item. The Specialist wants to identify the model names under a more specific category.

Which approach will produce the MOST appropriate result?

  1. Use regular expressions to determine the entities.
  2. Use Topic Modelling to determine entities.
  3. Create a Custom Entity Recognition model.
  4. AWS Exam Readiness Courses
  5. Create a list for each product and use string matching to determine their entities.

Correct Answer: 3

Custom entity recognition extends the capability of Amazon Comprehend by enabling you to identify new entity types not supported as one of the preset generic entity types. This means that in addition to identifying entity types such as LOCATION, DATE, PERSON, and so on, you can analyze documents and extract entities like product codes or business-specific entities that fit your particular needs.

Creating a custom entity recognition model is a more effective approach, compared to using string matching or regular expressions to identify entities. For example, to extract product codes, it would be difficult to enumerate all possible patterns to apply string matching. But a custom entity recognition model can learn the context where those product codes are most likely to appear and then make such inferences even though it has never previously seen the exact product codes. As well, typos in product codes and the addition of new product codes can still be expected to be caught by Amazon Comprehend’s custom entity recognition model but would be missed when using string matches against a static list.

Hence, the correct answer is: Create a Custom Entity Recognition model.

The option that says: Use regular expressions to determine the entities is incorrect. Although this is possible, it isn’t as effective as creating a Custom Entity Recognition model.

The option that says: Use Topic Modelling to determine entities is incorrect because this is specifically used for determining themes/topics from a collection of documents. Take note that we only need to identify entities from a list of words.

The option that says: Create a list for each product and use string matching to determine their entities is incorrect. Like regular expressions, it would be difficult to match all possible patterns with string matching. This would produce less accurate results than when using a Custom Entity Recognition model.


Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:

Tutorials Dojo AWS Practice Tests


Tutorials Dojo portal

FREE AWS Exam Readiness Digital Courses

Enroll Now – Our Azure Certification Exam Reviewers

azure reviewers tutorials dojo

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Browse Other Courses

Generic Category (English)300x250

Recent Posts

Written by: Carlo Acebedo

Carlo is a cloud engineer and a content creator at Tutorials Dojo. He's also a member of the AWS Community builder and holds 5 AWS Certifications. Carlo specializes in building and automating solutions in the Amazon Web Services Cloud.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?