AWS Glue DataBrew

Bookmarks

Features
Components
Pricing
References

AWS Glue DataBrew Cheat Sheet

AWS Glue DataBrew is a tool designed to streamline your data analysis process.
It allows you to interact with your data directly, eliminating the need for complex coding.
With its extensive library of over 250 pre-built transformations, you can easily clean, normalize, and format your data, preparing it for insightful analysis.

AWS Glue DataBrew is commonly used for:

Reducing the time required to prepare data for analytics and machine learning.
Automating data preparation tasks with a wide range of ready-made transformations.
Facilitating collaboration among business analysts, data scientists, and data engineers to extract insights from raw data.
Exploring and transforming large volumes of raw data without the need to manage any infrastructure.

Features

Transformations More than 250 pre-built transformations are available with AWS Glue DataBrew that can be applied to your data. These transformations include actions like filtering rows, replacing values, splitting and combining columns, and many more. It also includes transformations that apply natural language processing (NLP) techniques to split sentences into phrases.

Data Formats and Data Sources AWS Glue DataBrew supports various data formats, including CSV, JSON, Parquet, and more. It can read data from different data sources such as Amazon S3, Amazon Redshift, Amazon RDS, and more.

Job and Scheduling You can create jobs in AWS Glue DataBrew to automate the data preparation tasks. These jobs can be scheduled to run at specified times.

Security AWS Glue DataBrew works seamlessly with AWS Identity and Access Management (IAM), which allows you to control access to your data and resources. All data processed by AWS Glue DataBrew is encrypted in transit and at rest.

Integration AWS Glue DataBrew can be integrated with other AWS services like AWS Glue for ETL tasks, Amazon QuickSight for data visualization, and Amazon SageMaker for machine learning.

Components

Project – A project in DataBrew is essentially your workspace. It contains related items such as data, transformations, and scheduled processes.

Dataset – a collection of data organized into rows or records and further divided into columns or fields.
Recipe – a series of instructions or steps for data that DataBrew will act upon. A recipe can have multiple steps, and each step can have multiple actions.
Job – is the process of transforming your data by executing the instructions from a recipe. A job can execute your data recipes based on a preset schedule.
Data Lineage – DataBrew tracks your data’s origin in a visual interface known as a data lineage. This view shows the flow of data through different entities from its original source.
Data Profile – is a process where DataBrew generates a comprehensive report about your data.

Pricing

AWS Glue DataBrew pricing is based on usage with no upfront commitment.
The number of nodes used determines the cost.
The cost per node is $0.48 per hour.
The DataBrew interactive sessions are billed per 30-minute session at a rate of $1.00 per session.
Pricing can vary by AWS Region.
For the most accurate pricing, use the AWS Pricing Calculator.

References:

https://docs.aws.amazon.com/glue/

https://docs.aws.amazon.com/databrew/latest/dg/what-is.html

https://aws.amazon.com/glue/pricing/

Written by: Nestor Mayagma Jr.

Nestor is a cloud engineer and member of the AWS Community Builder. He continuously strives to expand his knowledge and expertise in AWS to foster personal and professional growth. He also shares his insights with the community through numerous AWS blogs, highlighting his commitment to Cloud Computing technology. In his leisure time, he indulges in playing FPS and other online games.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses