Last updated on November 14, 2024
Amazon Textract Cheat Sheet
- A fully managed document analysis service for detecting and extracting information from scanned documents.
- Returns extracted data as key-value pairs (e.g., Name: John Doe)
- Supports virtually any type of documents
- Can detect text written in Standard English alphabet and ASCII symbols.
Common Use Cases:
- Building search indexes
- Importing documents into a business application
- Building automated document processing solutions
- Text extraction for Natural Language Processing (NLP) Applications
- Maintaining document compliance
Concepts
- Amazon Textract returns a confidence score for each identified element, which indicates the probability that a given prediction is correct.
- A low-confidence score can be rerouted to Amazon Augmented AI (A2I) for further human review.
- The asynchronous operation allows you to process multipage PDF documents.
- Detect Document Text API
- Uses optical character recognition (OCR) technology to extract printed text and handwriting from a document.
- Analyze Document API
- Extracts printed text, handwriting, and other data from tables and key-value pairs from forms.
Amazon Textract Pricing
- You only pay for what you use.
- Charges vary for Detect Document Text API and Analyze Document API, with the latter being the more expensive.
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.
Amazon Textract Cheat Sheet References References:
https://docs.aws.amazon.com/textract/latest/dg/what-is.html
https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html
https://aws.amazon.com/blogs/machine-learning/using-amazon-textract-with-amazon-augmented-ai-for-processing-critical-documents/