Google BigQuery vs BigTable

Last updated on June 2, 2023

BigQuery

BigTable

BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real-time.

You can use bq command-line tool or Google Cloud Console to interact with BigTable.

You can access BigQuery by using the Cloud Console, by using the bq command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views.

You specify a location for storing your BigQuery data when you create a dataset. After you create the dataset, the location cannot be changed, but you can copy the dataset to a different location, or manually move (recreate) the dataset in a different location.

You can set control access to datasets in BigQuery at table and view level, column-level, or use IAM.

There are several ways to ingest data into BigQuery:

Batch load a set of data records.
Stream individual records or batches of records.
Use queries to generate new data and append or overwrite the results to a table.
Use a third-party application or service.

Data loaded in BigQuery can be exported in several formats. BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files. When you export your data to multiple files, the size of the files will vary.

Jobs are actions that BigQuery runs on your behalf to load data, export data, query data, or copy data.

An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery. Instead of loading or streaming the data, you create a table that references the external data source.

A fully managed, scalable NoSQL database service for large analytical and operational workloads.

You can use cbt command-line tool or Google Cloud Console to interact with BigTable.

Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, enabling you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key.

Cloud Bigtable is ideal for storing very large amounts of single-keyed data with very low latency. It supports high read and write throughput at low latency, and it is an ideal data source for MapReduce operations.

Cloud Bigtable stores data in massively scalable tables, each of which is a sorted key/value map. The table is composed of rows, each of which typically describes a single entity, and columns, which contain individual values for each row. Each row is indexed by a single row key, and columns that are related to one another are typically grouped together into a column family. Each column is identified by a combination of the column family and a column qualifier, which is a unique name within the column family.

To use Cloud Bigtable, you create instances, which contain up to 4 clusters that your applications can connect to. Each cluster contains nodes, the compute units that manage your data and perform maintenance tasks.

A Cloud Bigtable instance is a container for your data. Instances have one or more clusters, located in different zones. Each cluster has at least 1 node.

Cloud Bigtable backups let you save a copy of a table’s schema and data, then restore from the backup to a new table at a later time.

Dataflow templates allow you to export data from Cloud Bigtable from a variety of data types then import the data back into Cloud Bigtable.

Replication for Cloud Bigtable enables you to increase the availability and durability of your data by copying it across multiple regions or multiple zones within the same region. You can also isolate workloads by routing different types of requests to different clusters.

You can use Dataproc to create one or more Compute Engine instances that can connect to a Cloud Bigtable instance and run Hadoop jobs.

References:

https://cloud.google.com/bigquery/docs
https://cloud.google.com/bigtable/docs

Written by: Jon Bonso

Jon Bonso is the co-founder of Tutorials Dojo, an EdTech startup and an AWS Digital Training Partner that provides high-quality educational materials in the cloud computing space. He graduated from Mapúa Institute of Technology in 2007 with a bachelor's degree in Information Technology. Jon holds 10 AWS Certifications and is also an active AWS Community Builder since 2020.