AWS Glue Data Quality

Bookmarks

Features
Pricing
References

AWS Glue Data Quality Cheat Sheet

AWS Glue Data Quality is a service that provides a way to monitor and measure the quality of your data. It’s part of the AWS Glue service and is built on the open-source DeeQu framework.

Use Cases

Analyzing data sets that are cataloged in the AWS Glue Data Catalog.
Continuously monitoring the quality of data in a data lake.

Adding a layer of data quality checks to traditional AWS Glue jobs.
AWS Glue Data Quality uses a domain-specific language called Data Quality Definition Language (DQDL) to define data quality rules.

Features

Serverless: No need to manage servers, AWS takes care of it.
Quick Start: Analyzes your data and creates data quality rules quickly.
Data Quality Issues Detection: Uses machine learning to identify potential data quality issues.
Rule Customization: Comes with over 25 pre-defined data quality rules, but also allows you to create your own.
Data Quality Score: Provides a summary score that gives an overview of the overall quality of your data.
Bad Data Identification: Identifies the exact records that are causing your data quality scores to decrease.
Pay as you go: You only pay for what you use, with no upfront costs or long-term commitments.
No lock-in: Built on the open-source DeeQu framework.
Data Quality Checks: Allows you to enforce data quality checks on your AWS Glue ETL pipelines and Data Catalog.

Pricing

AWS Glue Data Quality charges are based on the resources used and the duration they are running.
Adding data quality checks to ETL jobs may increase runtime or DPU consumption.
Charges are $0.44 per DPU-hour for standard usage.

References:

https://docs.aws.amazon.com/glue/latest/dg/glue-data-quality.html

https://aws.amazon.com/glue/pricing/

https://community.aws/content/2aYTqhWxwFQcZVUmRQTZkVx31nz/performing-data-quality-checks-and-detecting-anomalies-with-aws-glue

Written by: Nestor Mayagma Jr.

Nestor is a cloud engineer and member of the AWS Community Builder. He continuously strives to expand his knowledge and expertise in AWS to foster personal and professional growth. He also shares his insights with the community through numerous AWS blogs, highlighting his commitment to Cloud Computing technology. In his leisure time, he indulges in playing FPS and other online games.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses