Google BigQuery Cheat Sheet

Last updated on June 26, 2023

Bookmarks

Features
Loading data into BigQuery
Querying from external data sources
Monitoring
Pricing
Validate Your Knowledge

Google BigQuery Cheat Sheet

A fully managed data warehouse where you can feed petabyte-scale data sets and run SQL-like queries.

Features

Cloud BigQuery is a serverless data warehousing technology.
It provides integration with the Apache big data ecosystem allowing Hadoop/Spark and Beam workloads to read or write data directly from BigQuery using Storage API.
BigQuery supports a standard SQL dialect that is ANSI:2011 compliant, which reduces the need for code rewrites.
Automatically replicates data and keeps a seven-day history of changes which facilitates restoration and data comparison from different times.

Loading data into BigQuery

You must first load your data into BigQuery before you can run queries. To do this you can:

Load a set of data records from Cloud Storage or from a local file. The records can be in Avro, CSV, JSON (newline delimited only), ORC, or Parquet format.

Export data from Datastore or Firestore and load the exported data into BigQuery.
Load data from other Google services, such as
- Google Ad Manager
- Google Ads
- Google Play
- Cloud Storage
- Youtube Channel Reports
- Youtube Content Owner reports
Stream data one record at a time using streaming inserts.
Write data from a Dataflow pipeline to BigQuery.
Use DML statements to perform bulk inserts. Note that BigQuery charges for DML queries. See Data Manipulation Language pricing.

Querying from external data sources

BigQuery offers support for querying data directly from:
- Cloud BigTable
- Cloud Storage
- Cloud SQL
Supported formats are:
- Avro
- CSV
- JSON (newline delimited only)
- ORC
- Parquet
To query data on external sources, you have to create external table definition file that contains the schema definition and metadata.

Google BigQuery Monitoring

BigQuery creates log entries for actions such as creating or deleting a table, purchasing slots, or running a load job.

Google BigQuery Pricing

On-demand pricing lets you pay only for the storage and compute that you use.
Flat-rate pricing with reservations enables high-volume users to choose price for workloads that are predictable.
To estimate query costs, it is best practice to acquire the estimated bytes read by using the query validator in Cloud Console or submitting a query job using the API with the dryRun parameter. Use this information in Pricing Calculator to calculate the query cost.

Validate Your Knowledge

Question 1

Your company has a 5 TB file in Parquet format stored in Google Cloud Storage bucket. A team of analysts, who are only proficient in SQL, needs to temporarily access these files to run ad-hoc queries. You need a cost-effective solution to fulfill their request as soon as possible.

What should you do?

Load the data in a new BigQuery table. Use the bq load command, specify PARQUET using the --source_format flag, and include a Cloud Storage URL.
Create external tables in BigQuery. Use the Cloud Storage URL as a data source.
Load the data in BigTable. Give the analysts the necessary IAM roles to run SQL queries.
Import the data to Memorystore to provide quick access to Parquet data in the Cloud Storage bucket.

Show me the answer!

Correct Answer: 2

An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery. Instead of loading or streaming the data, you create a table that references the external data source.

BigQuery supports querying Cloud Storage data in the following formats:

– Comma-separated values (CSV)

– JSON (newline-delimited)

– Avro

– ORC

– Parquet

– Datastore exports

– Firestore exports

BigQuery supports querying Cloud Storage data from these storage classes:

– Standard

– Nearline

– Coldline

– Archive

To query a Cloud Storage external data source, provide the Cloud Storage URL path to your data, and create a table that references the data source. The table used to reference the Cloud Storage data source can be a permanent table or a temporary table.

It is stated in the scenario that a low-cost and temporary access to Parquet data should be provided. Using the BigQuery temporary external table will satisfy this requirement compared to loading the data to permanent tables that use datasets to store the data. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

Hence, the correct answer is: Create external tables in BigQuery. Use the Cloud Storage URL as a data source.

The option that says: Load the data in a new BigQuery table. Use the bq load command, specify PARQUET using the –source_format flag, and include a Cloud Storage URL is incorrect because doing this will load the data on the BigQuery dataset which is not ideal for accessing data temporarily. Instead, you can use the temporary table for external data sources in BigQuery.

The option that says: Load the data in BigTable. Give the analysts the necessary IAM roles to run SQL queries is incorrect because BigTable is a NoSQL database. Note: it is stated in the scenario that the analysts are only proficient in SQL, and BigTable is not a type of SQL database.

The option that says: Import the data to Memorystore to provide quick access to Parquet data in the Cloud Storage bucket is incorrect because Memorystore is only used to build application caches. This service is compatible with open source Redis and Memcached.

References:

https://cloud.google.com/bigquery/external-data-cloud-storage
https://cloud.google.com/bigquery/external-data-sources

Note: This question was extracted from our Google Certified Associate Cloud Engineer Practice Exams.

For more Google Cloud practice exam questions with detailed explanations, check out the Tutorials Dojo Portal:

Google BigQuery Cheat Sheet References:

https://cloud.google.com/bigquery
https://cloud.google.com/bigquery/docs/introduction

Written by: Jon Bonso

Jon Bonso is the co-founder of Tutorials Dojo, an EdTech startup and an AWS Digital Training Partner that provides high-quality educational materials in the cloud computing space. He graduated from Mapúa Institute of Technology in 2007 with a bachelor's degree in Information Technology. Jon holds 10 AWS Certifications and is also an active AWS Community Builder since 2020.

Google BigQuery

Google BigQuery

Bookmarks

Google BigQuery Cheat Sheet

Features

Loading data into BigQuery

Querying from external data sources

Google BigQuery Monitoring

Google BigQuery Pricing

Validate Your Knowledge

Question 1

Show me the answer!

Google BigQuery Cheat Sheet References:

Get $3 OFF ALL CCP, SAA, CDA, and SysOps Video Courses!

Be Inspired and Mentored with Cloud Career Journeys!

Enroll Now – Our Azure Certification Exam Reviewers

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

FREE AWS Exam Readiness Digital Courses

Subscribe to our YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Recent Posts

Written by: Jon Bonso

Our Community

What our students say about us?

Google BigQuery

Google BigQuery

Bookmarks

Google BigQuery Cheat Sheet

Features

Loading data into BigQuery

Querying from external data sources

Google BigQuery Monitoring

Google BigQuery Pricing

Validate Your Knowledge

Question 1

Show me the answer!

Google BigQuery Cheat Sheet References:

Get $3 OFF ALL CCP, SAA, CDA, and SysOps Video Courses!

Be Inspired and Mentored with Cloud Career Journeys!

Enroll Now – Our Azure Certification Exam Reviewers

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

FREE AWS Exam Readiness Digital Courses

Subscribe to our YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Recent Posts

Written by: Jon Bonso

Our Community

What our students say about us?

Did you find our content helpful?