AWS Glue Data Quality Cheat Sheet
AWS Glue Data Quality is a service that provides a way to monitor and measure the quality of your data. It’s part of the AWS Glue service and is built on the open-source DeeQu framework.
Use Cases
- Analyzing data sets that are cataloged in the AWS Glue Data Catalog.
- Continuously monitoring the quality of data in a data lake.
- Adding a layer of data quality checks to traditional AWS Glue jobs.
- AWS Glue Data Quality uses a domain-specific language called Data Quality Definition Language (DQDL) to define data quality rules.
Features
- Serverless: No need to manage servers, AWS takes care of it.
- Quick Start: Analyzes your data and creates data quality rules quickly.
- Data Quality Issues Detection: Uses machine learning to identify potential data quality issues.
- Rule Customization: Comes with over 25 pre-defined data quality rules, but also allows you to create your own.
- Data Quality Score: Provides a summary score that gives an overview of the overall quality of your data.
- Bad Data Identification: Identifies the exact records that are causing your data quality scores to decrease.
- Pay as you go: You only pay for what you use, with no upfront costs or long-term commitments.
- No lock-in: Built on the open-source DeeQu framework.
- Data Quality Checks: Allows you to enforce data quality checks on your AWS Glue ETL pipelines and Data Catalog.
Pricing
- AWS Glue Data Quality charges are based on the resources used and the duration they are running.
- Adding data quality checks to ETL jobs may increase runtime or DPU consumption.
- Charges are $0.44 per DPU-hour for standard usage.
References:
https://docs.aws.amazon.com/glue/latest/dg/glue-data-quality.html
AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!
Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!
View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses