Last updated on October 5, 2023
What is Amazon Comprehend?
- A managed Natural Language Processing (NLP) service that you can use to extract meaningful information from unstructured texts so you can analyze them in a human-like context.
- It is an off-the-shelf solution that does not require deep machine learning expertise to get started.
- Works with social media feeds, web pages, comments, product reviews, articles, or emails.
- Can analyze texts in real-time by using built-in and custom models.
- Also offers medical insights and protected health information (PHI) detection via Amazon Comprehend Medical
Amazon Comprehend Common Use Cases
- Sentiment analysis for social media posts
- Organize documents by topics
- Knowledge management and discovery
- Classifies support tickets for better issue handling
- Medical cohort analysis
- Identify personally identifiable information (PII) in documents.
- Identify protected health information (PHI) in documents
Amazon Comprehend generates insights in six (6) categories:
- Entities
- Detects and categorizes real-world objects like date, organization, person, quantity, brands, or even a title given to a song or movie.
- Custom Entity Recognition
- Allows you to identify new entities that are not supported by the preset entities.
- This is useful if you want to extract entities that are specific only to your business, such as product codes.
- Sentiment
- Detects and classifies emotions into neutral, positive, negative, or mixed.
- Language
- Detects the language used in a text by using identifiers from RFC 5646.
- Useful for multilingual companies or applications.
- Key Phrases
- A key phrase refers to a noun or a noun phrase that describes a particular thing.
- Personally Identifiable Information (PII)
- Determines sensitive information that could be used to identify a person, such as full name, birth date, bank account number, phone number, or email.
- Syntax
- Determine the different parts of speech used in the document, such as noun, pronoun, verb, adjective, adverb, etc.
Amazon Comprehend Concepts
- Each insight is associated with a confidence score.
- A confidence score is between 0 and 100, indicating the probability that a given prediction is correct.
- A product review with a positive sentiment and a 0.99 confidence score highly suggest positive feedback from a customer.
- Topic Modeling
- Classifies a collection of documents according to its common subject.
- For example, you can use Topic Modeling to categorize news articles into politics, sports, business, entertainment, etc.
- Comprehend custom
- It helps non-experts in machine learning build and train their own NLP models suited to their specific needs.
- Amazon Comprehend uses a machine learning method called transfer learning to train custom models.
Amazon Comprehend Medical
- A type of Amazon Comprehend service that is focused on medical use cases
- A natural language processing service that extracts relevant medical information from unstructured text.
- Can quickly and accurately gather medical conditions, medication, dosage, strength, and frequency from a text file
- Has the ability to recognize and analyze Protected Health Information (PHI) from text documents.
Amazon Comprehend Pricing
- Charges are based on units where a single unit is equal to 100 characters.
- 3 unit (300 characters) minimum charge per request.
- All insights except for Syntax analysis are charged for $0.0001 per 10M units. Syntax Analysis is charged for $0.00005 per 10M units.
- Topic Modeling has a flat rate of $1.00 per job.
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.
Validate Your Knowledge
Question 1
A Machine Learning Specialist working for an e-commerce company is creating an application using Amazon Comprehend. The application will analyze sentiments for reviews about various electronic products. During development, he noticed that all device model names are labeled as Commercial item
. The Specialist wants to identify the model names under a more specific category.
Which approach will produce the MOST appropriate result?
- Use regular expressions to determine the entities.
- Use Topic Modelling to determine entities.
- Create a Custom Entity Recognition model.
- Create a list for each product and use string matching to determine their entities.
For more AWS practice exam questions with detailed explanations, visit the Tutorials Dojo Portal:
Amazon Comprehend Cheat Sheet References:
https://aws.amazon.com/comprehend/
https://docs.aws.amazon.com/comprehend/latest/dg/how-it-works.html
https://aws.amazon.com/comprehend/pricing/