Last updated on November 25, 2025
Amazon Textract Cheat Sheet
- A fully managed document analysis service for detecting and extracting information from scanned documents.
- Returns extracted data as key-value pairs (e.g., Name: John Doe)
- Supports virtually any type of documents
- Can detect text written in Standard English alphabet and ASCII symbols.
- Mention integration options: works with AWS SDKs (Python, Java, Node.js, etc.), AWS CLI, and Boto3.
- Supports image files (PNG, JPG) and PDFs (single and multipage).
- Can detect handwriting in multiple languages (not just English).
- Works natively with S3 (you can process documents stored in S3 directly).
Common Use Cases:
- Building search indexes
- Importing documents into a business application
- Building automated document processing solutions
- Text extraction for Natural Language Processing (NLP) Applications
- Maintaining document compliance
- Invoice processing: Extract invoice number, date, and total amount automatically.
- Receipt scanning: Automatically categorize expenses for accounting.
- Legal document review: Identify contract clauses, parties, and dates.
- Healthcare forms: Extract patient information and medical codes.
- Data migration: Move data from legacy paper forms into structured databases.
Concepts
- Amazon Textract returns a confidence score for each identified element, which indicates the probability that a given prediction is correct.
- A low-confidence score can be rerouted to Amazon Augmented AI (A2I) for further human review.
- The asynchronous operation allows you to process multipage PDF documents.
- Detect Document Text API
- Uses optical character recognition (OCR) technology to extract printed text and handwriting from a document.
- Analyze Document API
- Extracts printed text, handwriting, and other data from tables and key-value pairs from forms.
APIs
Detect Document Text API:
-
Returns all text in the document as LINE and WORD blocks.
-
Best for simple text extraction without structure.
Analyze Document API:
-
Returns key-value pairs, tables, and forms as structured data.
-
Can also return table relationships for easier downstream processing.
-
Supports FeatureTypes parameter (
["TABLES", "FORMS"]) to control extraction.
New additions:
-
StartDocumentTextDetection / StartDocumentAnalysis: asynchronous APIs for large or multipage documents.
-
GetDocumentTextDetection / GetDocumentAnalysis: retrieve results for asynchronous operations.
Amazon Textract Pricing
- You only pay for what you use.
- Charges vary for Detect Document Text API and Analyze Document API, with the latter being the more expensive.
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.
Amazon Textract Cheat Sheet References References:
https://docs.aws.amazon.com/textract/latest/dg/what-is.html
https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html
https://aws.amazon.com/blogs/machine-learning/using-amazon-textract-with-amazon-augmented-ai-for-processing-critical-documents/













