Last updated on January 7, 2026
AWS Glue Data Quality Cheat Sheet
- AWS Glue Data Quality is a service that provides a way to monitor and measure the quality of your data. It’s part of the AWS Glue service and is built on the open-source DeeQu framework.
- Supports data quality checks on AWS Lake Formation managed Iceberg, Delta Lake, and Hudi tables.
- Integrates with Amazon SageMaker AI Lakehouse tables for unified analytics and governance.
- Can be used alongside zero ETL integrations to validate data ingested from supported AWS services.
Use Cases
- Analyzing data sets that are cataloged in the AWS Glue Data Catalog.
- Continuously monitoring the quality of data in a data lake.
- Adding a layer of data quality checks to traditional AWS Glue jobs.
- AWS Glue Data Quality uses a domain-specific language called Data Quality Definition Language (DQDL) to define data quality rules.
- Organizing and reporting data quality results across teams or domains using labeled rules.
- Validating data ingested through zero ETL pipelines without building custom validation workflows.
- Enforcing data quality standards on Lakehouse architectures using governed table formats.
Features
- Serverless: No need to manage servers, AWS takes care of it.
- Quick Start: Analyzes your data and creates data quality rules quickly.
- Data Quality Issues Detection: Uses machine learning to identify potential data quality issues.
- Rule Customization: Comes with over 25 pre-defined data quality rules, but also allows you to create your own.
- Data Quality Score: Provides a summary score that gives an overview of the overall quality of your data.
- Bad Data Identification: Identifies the exact records that are causing your data quality scores to decrease.
- Pay as you go: You only pay for what you use, with no upfront costs or long-term commitments.
- No lock-in: Built on the open-source DeeQu framework.
- Data Quality Checks: Allows you to enforce data quality checks on your AWS Glue ETL pipelines and Data Catalog.
- Rule Labeling: Supports labeling of data quality rules, allowing results to be grouped, filtered, and reported by category, team, or domain.
- Constants in DQDL: Supports constants in Data Quality Definition Language scripts, reducing query size and improving maintainability of large rule sets.
- Pre Processing Queries: Allows pre processing SQL queries to prepare or filter data before data quality rules are evaluated.
- Open Table Format Support: Supports evaluating data quality rules on Apache Iceberg, Delta Lake, and Hudi tables registered in the AWS Glue Data Catalog.
- Automatic Column Statistics: Automatically generates column level statistics to improve profiling and rule evaluation accuracy.
Pricing
- AWS Glue Data Quality charges are based on the resources used and the duration they are running.
- Adding data quality checks to ETL jobs may increase runtime or DPU consumption.
- Charges are $0.44 per DPU-hour for standard usage.
- Additional charges may apply when evaluating data quality rules on large datasets or when used alongside AWS Glue ETL jobs with increased DPU consumption.
References:
https://docs.aws.amazon.com/glue/latest/dg/glue-data-quality.html











