Last updated on March 27, 2023
Google Cloud Dataproc Cheat Sheet
- Build fully managed Apache Spark, Apache Hadoop, Presto, and other OSS clusters on the Google Cloud Platform using Cloud Dataproc.
Features
- You can spin up resizable clusters quickly with various virtual machine types, disk sizes, number of nodes, and networking options on Cloud Dataproc.
- Dataproc provides autoscaling features to help you automatically manage the addition and removal of cluster workers.
- Cloud Dataproc has built-in integration with the following Google Cloud services for a more complete and robust platform.
- Cloud Storage
- BigQuery
- Cloud Bigtable
- Cloud Logging
- Cloud Monitoring
- AI Hub
- It is capable of image versioning. This will allow you to switch between different versions of the tools you want to use.
- To avoid charges for inactive clusters, you can utilize Dataproc’s scheduled deletion.
- You can manage your clusters via
- Cloud Console Web UI
- Cloud SDK
- RESTful APIs
- SSH access.
- Dataproc can be provisioned with custom images according to your needs.
- Workflow templates provide a flexible and simple mechanism for managing and executing workflows.
Pricing
- Only pay for the resources you use and lower the total cost of ownership of OSS
- Dataproc pricing is based on the number of vCPUs and the duration that they run.
Google Cloud Dataproc Cheat Sheet References:
https://cloud.google.com/dataproc
https://cloud.google.com/dataproc/docs/concepts/overview