Google Cloud Dataflow

  • Cloud Dataflow is a fully managed data processing service for executing a wide variety of data processing patterns.

Features

  • Dataflow templates allow you to easily share your pipelines with team members and across your organization.
  • You can also take advantage of Google-provided templates to implement useful but simple data processing tasks.
  • Autoscaling lets the Dataflow automatically choose the appropriate number of worker instances required to run your job.
  • You can build a batch or streaming pipeline protected with customer-managed encryption key (CMEK) or access CMEK-protected data in sources and sinks.
  • Dataflow is integrated with VPC Service Controls to provide additional security on data processing environments by improving the ability to mitigate the risk of data exfiltration.

Pricing

  • Dataflow jobs are billed per second, based on the actual use of Dataflow batch or streaming workers. Additional resources, such as Cloud Storage or Pub/Sub, are each billed per that service’s pricing.
  • IT Certification Category (English)728x90

Validate Your Knowledge

Question 1

Your company has 1 TB of unstructured data in various file formats that are securely stored on its on-premises data center. The Data Analytics team needs to perform ETL (Extract, Transform, Load) processes on these data which will eventually be consumed by a Dataflow SQL job.

What should you do?

  1. Use the bq command-line tool in Cloud Shell and upload your on-premises data to Google BigQuery.
  2. Use the Google Cloud Console to import the unstructured data by performing a dump into Cloud SQL.
  3. Run a Dataflow import job using gcloud to upload the data into Cloud Spanner.
  4. Using the gsutil command-line tool in Cloud SDK, move your on-premises data to Cloud Storage.

Correct Answer: 4

Dataflow SQL can query the following sources:

– Pub/Sub topics

– Cloud Storage filesets

– BigQuery tables

BigQuery is a serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility. With serverless data warehousing, Google does all resource provisioning behind the scenes, so you can focus on data and analysis rather than worrying about upgrading, securing, or managing the infrastructure.

Google Cloud Storage is a powerful and cost-effective storage solution for unstructured objects, perfect for everything from hosting live web content to storing data for analytics to archiving and backup.

It is stated in the scenario that you need to upload unstructured data to the Google Cloud. Among the possible sources of data for running a Dataflow SQL job, Google Cloud Storage is the only storage that can support various data formats or unstructured data.

Hence, the correct answer is: Using the gsutil command-line tool in Cloud SDK, move your on-premises data to Cloud Storage.

The option that says: Use the bq command-line tool in Cloud Shell and upload your on-premises data to Google BigQuery is incorrect because loading data to BigQuery has to be in a structured format like JSON or CSV.

The option that says: Use the Google Cloud Console to import the unstructured data by performing a dump into Cloud SQL is incorrect because Cloud SQL is mainly used for storing relational data which means it’s not suitable for storing unstructured data.

The option that says: Run a Dataflow import job using gcloud to upload the data into Cloud Spanner is incorrect because Cloud Spanner is commonly used for relational data since it is a fully managed relational database service. It is not suitable for storing unstructured data. You have to use Cloud Storage instead.

References:
https://cloud.google.com/dataflow/docs/guides/sql/data-sources-destinations
https://console.cloud.google.com/getting-started?tutorial=storage_quickstart

Note: This question was extracted from our Google Certified Associate Cloud Engineer Practice Exams.

For more Google Cloud practice exam questions with detailed explanations, check out the Tutorials Dojo Portal:

Google Certified Associate Cloud Engineer Practice Exams

References:
https://cloud.google.com/dataflow

Pass your AWS and Azure Certifications with the Tutorials Dojo Portal

Tutorials Dojo portal

Our Bestselling AWS Certified Solutions Architect Associate Practice Exams

AWS Certified Solutions Architect Associate Practice Exams

Enroll Now – Our AWS Practice Exams with 95% Passing Rate

AWS Practice Exams Tutorials Dojo

Enroll Now – Our Azure Certification Exam Reviewers

azure reviewers tutorials dojo

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

Tutorials Dojo Study Guide and Cheat Sheets-2

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Browse Other Courses

Generic Category (English)300x250

Recent Posts

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers
error: Content is protected !!