- Cloud Dataflow is a fully managed data processing service for executing a wide variety of data processing patterns.
- Dataflow templates allow you to easily share your pipelines with team members and across your organization.
- You can also take advantage of Google-provided templates to implement useful but simple data processing tasks.
- Autoscaling lets the Dataflow automatically choose the appropriate number of worker instances required to run your job.
- You can build a batch or streaming pipeline protected with customer-managed encryption key (CMEK) or access CMEK-protected data in sources and sinks.
- Dataflow is integrated with VPC Service Controls to provide additional security on data processing environments by improving the ability to mitigate the risk of data exfiltration.
- Dataflow jobs are billed per second, based on the actual use of Dataflow batch or streaming workers. Additional resources, such as Cloud Storage or Pub/Sub, are each billed per that service’s pricing.
Validate Your Knowledge
Your company has 1 TB of unstructured data in various file formats that are securely stored on its on-premises data center. The Data Analytics team needs to perform ETL (Extract, Transform, Load) processes on these data which will eventually be consumed by a Dataflow SQL job.
What should you do?
- Use the
bqcommand-line tool in Cloud Shell and upload your on-premises data to Google BigQuery.
- Use the Google Cloud Console to import the unstructured data by performing a dump into Cloud SQL.
- Run a Dataflow import job using
gcloudto upload the data into Cloud Spanner.
- Using the
gsutilcommand-line tool in Cloud SDK, move your on-premises data to Cloud Storage.