- A fully managed document analysis service for detecting and extracting information from scanned documents.
- Returns extracted data as key-value pairs (e.g., Name: John Doe)
- Supports virtually any type of documents
- Can detect text written in Standard English alphabet and ASCII symbols.
Common Use Cases:
- Building search indexes
- Importing documents into a business application
- Building automated document processing solutions
- Text extraction for Natural Language Processing (NLP) Applications
- Maintaining document compliance
- Amazon Textract returns a confidence score for each identified element, which indicates the probability that a given prediction is correct.
- A low-confidence score can be rerouted to Amazon Augmented AI (A2I) for further human review.
- The asynchronous operation allows you to process multipage PDF documents.
- Detect Document Text API
- Uses optical character recognition (OCR) technology to extract printed text and handwriting from a document.
- Analyze Document API
- Extracts printed text, handwriting, and other data from tables and key-value pairs from forms.
- You only pay for what you use.
- Charges vary for Detect Document Text API and Analyze Document API, with the latter being the more expensive.
Note: If you are studying for the AWS Certified Machine Learning Specialty exam, we highly recommend that you take our AWS Certified Machine Learning – Specialty Practice Exams and read our Machine Learning Specialty exam study guide.