Last updated on November 14, 2024
Amazon Textract Cheat Sheet
- A fully managed document analysis service for detecting and extracting information from scanned documents.
 - Returns extracted data as key-value pairs (e.g., Name: John Doe)
 - Supports virtually any type of documents
 - Can detect text written in Standard English alphabet and ASCII symbols.
 
Common Use Cases:
- Building search indexes
 - Importing documents into a business application
 - Building automated document processing solutions
 - Text extraction for Natural Language Processing (NLP) Applications
 - Maintaining document compliance
 
Concepts
- Amazon Textract returns a confidence score for each identified element, which indicates the probability that a given prediction is correct.
 - A low-confidence score can be rerouted to Amazon Augmented AI (A2I) for further human review.
 - The asynchronous operation allows you to process multipage PDF documents.
 - Detect Document Text API
- Uses optical character recognition (OCR) technology to extract printed text and handwriting from a document.
 
 - Analyze Document API
- Extracts printed text, handwriting, and other data from tables and key-value pairs from forms.
 
 
Amazon Textract Pricing
- You only pay for what you use.
 - Charges vary for Detect Document Text API and Analyze Document API, with the latter being the more expensive.
 
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.
Amazon Textract Cheat Sheet References References:
https://docs.aws.amazon.com/textract/latest/dg/what-is.html
https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html
https://aws.amazon.com/blogs/machine-learning/using-amazon-textract-with-amazon-augmented-ai-for-processing-critical-documents/
											
				












