Amazon Comprehend Cheat Sheet

Last updated on October 5, 2023

Bookmarks

Common Use Cases
Concepts
Amazon Comprehend Medical
Pricing
Validate Your Knowledge

What is Amazon Comprehend?

A managed Natural Language Processing (NLP) service that you can use to extract meaningful information from unstructured texts so you can analyze them in a human-like context.
It is an off-the-shelf solution that does not require deep machine learning expertise to get started.
Works with social media feeds, web pages, comments, product reviews, articles, or emails.
Can analyze texts in real-time by using built-in and custom models.
Also offers medical insights and protected health information (PHI) detection via Amazon Comprehend Medical

Amazon Comprehend Common Use Cases

Sentiment analysis for social media posts
Organize documents by topics
Knowledge management and discovery

Classifies support tickets for better issue handling
Medical cohort analysis
Identify personally identifiable information (PII) in documents.
Identify protected health information (PHI) in documents

Amazon Comprehend generates insights in six (6) categories:

Entities
- Detects and categorizes real-world objects like date, organization, person, quantity, brands, or even a title given to a song or movie.
- Custom Entity Recognition
  - Allows you to identify new entities that are not supported by the preset entities.
  - This is useful if you want to extract entities that are specific only to your business, such as product codes.
Sentiment
- Detects and classifies emotions into neutral, positive, negative, or mixed.
Language
- Detects the language used in a text by using identifiers from RFC 5646.
- Useful for multilingual companies or applications.
Key Phrases
- A key phrase refers to a noun or a noun phrase that describes a particular thing.
Personally Identifiable Information (PII)
- Determines sensitive information that could be used to identify a person, such as full name, birth date, bank account number, phone number, or email.
Syntax
- Determine the different parts of speech used in the document, such as noun, pronoun, verb, adjective, adverb, etc.

Amazon Comprehend Concepts

Each insight is associated with a confidence score.
A confidence score is between 0 and 100, indicating the probability that a given prediction is correct.
A product review with a positive sentiment and a 0.99 confidence score highly suggest positive feedback from a customer.
Topic Modeling
- Classifies a collection of documents according to its common subject.
- For example, you can use Topic Modeling to categorize news articles into politics, sports, business, entertainment, etc.
Comprehend custom
- It helps non-experts in machine learning build and train their own NLP models suited to their specific needs.
- Amazon Comprehend uses a machine learning method called transfer learning to train custom models.

Amazon Comprehend Medical

A type of Amazon Comprehend service that is focused on medical use cases
A natural language processing service that extracts relevant medical information from unstructured text.
Can quickly and accurately gather medical conditions, medication, dosage, strength, and frequency from a text file
Has the ability to recognize and analyze Protected Health Information (PHI) from text documents.

Amazon Comprehend Pricing

Charges are based on units where a single unit is equal to 100 characters.
3 unit (300 characters) minimum charge per request.
All insights except for Syntax analysis are charged for $0.0001 per 10M units. Syntax Analysis is charged for $0.00005 per 10M units.
Topic Modeling has a flat rate of $1.00 per job.

Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.

Validate Your Knowledge

Question 1

A Machine Learning Specialist working for an e-commerce company is creating an application using Amazon Comprehend. The application will analyze sentiments for reviews about various electronic products. During development, he noticed that all device model names are labeled as Commercial item. The Specialist wants to identify the model names under a more specific category.

Which approach will produce the MOST appropriate result?

Use regular expressions to determine the entities.
Use Topic Modelling to determine entities.
Create a Custom Entity Recognition model.
Create a list for each product and use string matching to determine their entities.

Show me the answer!

Correct Answer: 3

Custom entity recognition extends the capability of Amazon Comprehend by enabling you to identify new entity types not supported as one of the preset generic entity types. This means that in addition to identifying entity types such as LOCATION, DATE, PERSON, and so on, you can analyze documents and extract entities like product codes or business-specific entities that fit your particular needs.

Creating a custom entity recognition model is a more effective approach, compared to using string matching or regular expressions to identify entities. For example, to extract product codes, it would be difficult to enumerate all possible patterns to apply string matching. But a custom entity recognition model can learn the context where those product codes are most likely to appear and then make such inferences even though it has never previously seen the exact product codes. As well, typos in product codes and the addition of new product codes can still be expected to be caught by Amazon Comprehend’s custom entity recognition model but would be missed when using string matches against a static list.

Hence, the correct answer is: Create a Custom Entity Recognition model.

The option that says: Use regular expressions to determine the entities is incorrect. Although this is possible, it isn’t as effective as creating a Custom Entity Recognition model.

The option that says: Use Topic Modelling to determine entities is incorrect because this is specifically used for determining themes/topics from a collection of documents. Take note that we only need to identify entities from a list of words.

The option that says: Create a list for each product and use string matching to determine their entities is incorrect. Like regular expressions, it would be difficult to match all possible patterns with string matching. This would produce less accurate results than when using a Custom Entity Recognition model.

References:
https://docs.aws.amazon.com/comprehend/latest/dg/custom-entity-recognition.html
https://aws.amazon.com/blogs/machine-learning/build-a-custom-entity-recognizer-using-amazon-comprehend/

Note: This question was extracted from our AWS Certified Machine Learning – Specialty Practice Exams.

For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal: