AWS Certified Data Analytics – Specialty DAS-C01 Exam Study Path

Home » AWS » AWS Certified Data Analytics – Specialty DAS-C01 Exam Study Path

AWS Certified Data Analytics – Specialty DAS-C01 Exam Study Path

Last updated on October 17, 2023

The AWS Certified Data Analytics – Specialty DAS-C01 exam is intended for people who have experience in designing, building, securing, and maintaining analytics solutions on AWS. The exam will test your technical skills on how different AWS analytics services integrate with each other. You also need to know how they fit in the data lifecycle of collection, storage, processing, and visualization.

This specialty certification exam is on par with the other AWS Professional level tests so you need to allocate ample time for your preparation. With the help of the official exam study guide, you can determine the areas that you need to focus on. It will show you the specific knowledge areas and domains that you must review to pass the exam.

Study Materials for the AWS Certified Data Analytics – Specialty DAS-C01 Exam

Before taking the actual exam, we recommend checking out these study materials for AWS Certified Data Analytics Specialty. These resources will help you understand the concepts and strategies that you will need for you to pass the exam:

 

  1. Free Exam Readiness: AWS Certified Data Analytics – Specialty DAS-C01 – this is an interactive course that has responsive image maps, accordions, sample problem sets, section-based quizzes, and a practice test in the end.
  2. AWS FAQs – can help you grasp every service briefly. The responses you will find here are commonly asked questions, use cases, and comparison of various AWS services.
  3. Tutorials Dojo’s AWS Cheat Sheets – can help you understand the lengthy concepts found in the AWS FAQs. These cheat sheets are presented in a bullet point format to help you digest the information easily. This page summarizes all the analytics services of AWS.
  4. AWS Knowledge Center – you can use this website to find and understand the most frequent questions and requests AWS receives from its customers.
  5. Tutorials dojo strip
  6. AWS Documentation and Whitepapers – this document will help you expand your knowledge on various AWS services with its detailed information. You can focus on the following whitepapers:
  7. Tutorials Dojo’s AWS Certified Data Analytics Specialty Practice Exams – this provides a comprehensive reviewer with complete and detailed explanations to help you pass your AWS Data Analytics exam on your first try. The Tutorials Dojo practice exams are well-regarded as the best AWS practice test reviewers in the market.

AWS Services to Focus On For Your DAS-C01 Specialty Exam

The AWS Certified Data Analytics Specialty has five domains: Collection, Storage and Data Management, Processing, Analysis and Visualization, and Security. To comprehend the different scenarios in the exam, you should have a thorough understanding of the following services:

  1. Amazon Athena – learn how you can analyze the data in the S3 bucket and how you can configure and optimize Athena’s performance.
  2. Amazon CloudSearch –  know the use case and features of the service.
  3. Amazon Elasticsearch – learn how you can integrate Elasticsearch and Kibana in different AWS services.
  4. Amazon EMR – understand the security, hardware, and software configurations of the EMR cluster and how you can use AWS Glue Data Catalog for table metadata.
  5. Amazon Kinesis – know the use case of each Kinesis service (Data Streams, Data Firehose, and Data Analytics) and how they differ from each other.
  6. Amazon QuickSight – learn how you can integrate QuickSight into your solution, how you can publish dashboards, reports, analytics, and how you can refresh your datasets.
  7. Amazon Redshift – understand the different SQL commands, the use case of Redshift cluster, Redshift Spectrum, and how you can analyze the data in the data warehouse.
  8. AWS Data Pipeline – learn the concepts and components of the pipeline.
  9. AWS Glue – understand the concepts of the data catalog, crawlers, workflows, triggers, jobs, job bookmarks, and job metrics.

You must know how these services interact to develop a complete data analytics solution in AWS. Also, prepare to see various Apache technologies, such as Apache Parquet, ORC, Avro, Oozie, Sqooq, HBase, and many more.

DAS-C01 Common Exam Scenarios for the AWS Certified Data Analytics – Specialty Exam

Scenario

Solution

DAS-C01 Exam Domain 1: Collection

A near-real-time solution is needed that only collects non-confidential data from sensitive streaming data and stores it in durable storage.

Use Amazon Kinesis Data Firehose to ingest streaming data and enable record transformation to utilize AWS Lambda for excluding sensitive data. Store the processed data in Amazon S3.

Large files are compressed into a single GZIP file and uploaded into an S3 bucket. You have to speed up the COPY process to load data into Amazon Redshift.

Split the GZIP file into smaller files and make sure that their number is a multiple of the number of the Redshift cluster’s slices.

Which service should you use to deliver streaming data from Amazon MSK to a Redshift cluster with low latency?

Redshift Streaming Ingestion

Ways to fix Amazon Kinesis Data Streams throttling issues on write requests.

  • Increase the number of shards using the UpdateShardCount API command.

  • Use random partition keys

A company needs a cost-effective solution for detecting anomalous data coming from an Amazon Kinesis Data stream.

Create a Kinesis Data Analytics application and use the RANDOM_CUT_FOREST function for anomaly detection.

DAS-C01 Exam Domain 2: Storage and Data Management

A company wants a cost-effective solution that will enable them to query a subset of data from a CSV file.

Use Amazon S3 Select

You need to populate a data catalog using data stored in Amazon S3, Amazon RDS, and Amazon DynamoDB.

Use an AWS Glue crawler schedule

A Data Analyst used the COPY command to migrate CSV files into a Redshift cluster. However, no data was imported and no errors were found after the process was finished.

  • The CSV files use carriage returns as a line terminator.

  • The IGNOREHEADER parameter was included in the COPY command.

What is a cost-effective solution to saving Redshift query results to external storage?

Use the Amazon Redshift UNLOAD command

A company is using Amazon S3 Standard-IA and Amazon S3 Glacier as its data storage.

Some data cannot be accessed with Amazon Athena queries. Which best explains this event?

Amazon Athena is trying to access data stored in Amazon S3 Glacier.

DAS-C01 Exam Domain 3: Processing

A company uses an Amazon EMR cluster to process 10 batch jobs every day. Each job takes about 20 minutes to complete. A solution to lower down the cost of the EMR cluster must be implemented.

Use transient Amazon EMR clusters

An Amazon Kinesis Client Library (KCL) application is processing data in a DynamoDB table that has provisioned write capacity. The application’s latency increases during peak times and it must be resolved immediately.

Increase the DynamoDB tables’ write throughput.

Thousands of files are being loaded in a central fact table hosted on Amazon Redshift. You need to optimize the cluster resource utilization when loading data into the fact table.

Use a single COPY command to load data.

A Lambda function is used to process data from a Kinesis Data stream. Results are delivered into Amazon ES. During peak hours, the processing time slows down.

Use multiple Lambda functions to process data concurrently.

A Data Analyst needs to join data stored in Amazon Redshift and data stored in Amazon S3. The Analyst wants a serverless solution that will reduce the workload of the Redshift cluster.

Create an external table using Amazon Redshift Spectrum for the S3 data and use Redshift SQL queries for join operations.

DAS-C01 Exam Domain 4: Analysis and Visualization

A company requires an out-of-the-box solution for visualizing complex real-world scenarios and forecasting trends.

Use ML-powered forecasting with Amazon QuickSight

A Data Analyst needs to use Amazon QuickSight to create daily reports based on the dataset stored in Amazon S3.

Create a daily schedule refresh for the dataset.

A company has encountered an import into SPICE error after using Amazon QuickSight to query a new Amazon Athena table that is associated with a new S3 bucket.

Configure the correct permissions for the new S3 bucket from the QuickSight Console.

A company needs a cost-effective solution for ad-hoc analyses and data visualizations.

Use Amazon Athena and Amazon QuickSight.

A company needs to visualize and analyze web logs in near-real time.

Use Amazon Kinesis Data Firehose to stream logs into Amazon Elasticsearch. Visualize logs using Kibana.

DAS-C01 Exam Domain 5: Security

Root device volume encryption must be enabled on all nodes of an EMR cluster. AWS CloudFormation is required for creating new resources.

Create a custom AMI with encrypted root device volume and place the AMI ID under the CustomAmild property within the CloudFormation template.

A solution is needed to encrypt data stored in an EBS volume that is attached to an EMR cluster

Use Linux Unified Key Setup (LUKS).

A company is having trouble accessing data in a Redshift cluster using Amazon QuickSight.

Create a new inbound rule for the cluster’s security group that allows access from the IP address range that Amazon QuickSight uses.

A company wants to prevent any user from creating EMR clusters that are accessible from the public Internet.

Enable the ‘block public access’ setting in the Amazon EMR Console.

A company wants data in a Kinesis Data stream to be encrypted. The company wants to manage the key rotation.

Specify a Customer Master Key when enabling server-side encryption for the Kinesis Data stream.

 

Validate Your Knowledge For the DAS-C01 Specialty Exam

After you’ve reviewed the materials above, the next resource that you should check is the FREE AWS sample questions for AWS Data Analytics Specialty. Although this sample exam is not on the same level of difficulty as one might expect on the real exam, it is still a helpful resource for your review. Be sure to check the sample questionnaire often since AWS may upload a new version of it.

For high-quality practice exams, you can use our AWS Certified Data Analytics Specialty Practice Exams. These practice tests will help you boost your preparedness for the real exam. It contains multiple sets of questions that cover almost every area that you can expect from the real certification exam. We have also included detailed explanations and adequate reference links to help you understand why the option with the correct answer is better than the rest of the options. This is the value that you will get from our course. Practice exams are a great way to determine which areas you are weak in, and it will also highlight the important information that you might have missed during your review.

Sample Practice Test Questions for the DAS-C01 Exam:

Question 1

A game retail company stores user purchase data on a MySQL database hosted in Amazon RDS. The company will regularly run queries and analytical workloads against the current 3-month worth of data, which is expected to be several terabytes in size. The older historical data needs to be stored outside the database but is still used for the quarterly trend reports. To generate this report, the historical data are joined with the more recent data.

Which of the following options will provide optimal performance and a cost-effective solution based on the requirements?

  1. Sync a year’s worth of data on an Amazon RDS read replica. Export the older historical data into an Amazon S3 bucket for long-term storage. From the data in Amazon S3 and Amazon RDS, create an AWS Glue Data Catalog and use Amazon Athena to join the historical and current data to generate the reports.
  2. Use AWS Glue to perform ETL and incrementally load a year’s worth of data into an Amazon Redshift cluster. Run the regular queries against this cluster. Create an AWS Glue Data Catalog of the data in Amazon S3 and use Amazon Athena to join the historical and current data to generate the reports.
  3. Set up a multi-AZ RDS database and run automated snapshots on the standby instance. Configure Amazon Athena to run historical queries on the S3 bucket containing the automated snapshots.
  4. Export the historical data to Amazon S3. Create a daily job that exports current data from Amazon RDS to Amazon Redshift. Use the native Amazon Redshift for regular queries and Amazon Redshift Spectrum for joining the historical and current data.

Correct Answer: 4

With Redshift Spectrum, Amazon Redshift customers can easily query their data in Amazon S3. Like Amazon EMR, you get the benefits of open data formats and inexpensive storage, and you can scale out to thousands of nodes to pull data, filter, project, aggregate, group, and sort. Like Amazon Athena, Redshift Spectrum is serverless, and there’s nothing to provision or manage. You just pay for the resources you consume for the duration of your Redshift Spectrum query. Like Amazon Redshift itself, you get the benefits of a sophisticated query optimizer, fast access to data on local disks, and standard SQL. And like nothing else, Redshift Spectrum can execute highly sophisticated queries against an exabyte of data or more—in just minutes.

 There are two query requirements in the scenario:

1. Regular queries against the current data

2. Occasional queries for the quarterly report (involves both historical and current data)

To solve the first requirement, we create a daily job that incrementally loads current data from the MySQL database to Amazon Redshift. We then use the native Amazon Redshift for running analytics. As for the 2nd requirement, since the older data is infrequently accessed, we can load it to S3 instead of the Redshift cluster to save on storage costs. Redshift Spectrum eliminates the need to move data to Redshift in order for it to be accessible within a cluster. Hence, making it possible to run Redshift queries against both data in a Redshift cluster and an S3 bucket. Redshift Spectrum uses the same resources you used on the Redshift database, and it only charges for data queried from Amazon S3.

Therefore, the correct answer is: Export the historical data to Amazon S3. Create a daily job that exports current data from Amazon RDS to Amazon Redshift. Use the native Amazon Redshift for regular queries and Amazon Redshift Spectrum for joining the historical and current data.

The option that says: Use AWS Glue to perform ETL and incrementally load a year’s worth of data into an Amazon Redshift cluster. Run the regular queries against this cluster. Create an AWS Glue Data Catalog of the data in Amazon S3 and use Amazon Athena to join the historical and current data to generate the reports is incorrect. For this method to work, you’d have to copy first the more recent data from the Redshift cluster to Amazon S3 because Amazon Athena only works with data stored in S3. The pricing for per-data-scanned in Redshift Spectrum and Athena is the same. Since there is already a working Redshift cluster, taking advantage of its compute power to run join queries using Redshift Spectrum would be a better option than Athena. Take note that Amazon Athena relies on pooled resources that AWS manages while Redshift Spectrum runs on the dedicated resources allocated to your Redshift cluster, giving you a more consistent and optimized performance.

The option that says: Set up a multi-AZ RDS database and run automated snapshots on the standby instance. Configure Amazon Athena to run historical queries on the S3 bucket containing the automated snapshots is incorrect because automated RDS snapshots are not viewable on any S3 bucket that you own. You have to export the snapshot first in order to access it. Remember that the underlying S3 bucket that hosts the automated snapshots is owned by AWS and not you. On the other hand, manual snapshots are different since you own the S3 bucket where they are stored. After exporting the snapshot data on S3, you will be able to query using Amazon Athena.

The option that says: Sync a year’s worth of data on an Amazon RDS read replica. Export the older historical data into an Amazon S3 bucket for long-term storage. From the data in Amazon S3 and Amazon RDS, create an AWS Glue Data Catalog and use Amazon Athena to run historical queries to generate the reports is incorrect because you can’t select a subset of data to be synced on an RDS read replica. Additionally, analytical workloads are more optimized when done on an OLAP database like Amazon Redshift.

References:
https://aws.amazon.com/blogs/big-data/amazon-redshift-spectrum-extends-data-warehousing-out-to-exabytes-no-loading-required/
https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html

Amazon Redshift Overview:

Check out this Amazon Redshift Cheat Sheet:
https://tutorialsdojo.com/amazon-redshift/

Question 2

After creating a table from a record set stored in Amazon S3, a data analyst attempted to preview it by running a SELECT* statement. The analyst then partitioned the data using the year=2022/month=01/day=01/ format. While the partitioning was successful, no records were returned when the same SELECT * query was executed.

What could be a possible reason for this?

  1. The analyst did not run the MSCK REPAIR TABLE command after partitioning the data.
  2. The analyst forgot to run the MSCK REPAIR TABLE command before partitioning the data.
  3. AWS Exam Readiness Courses
  4. The S3 bucket where the sample data is stored has insufficient read permissions.
  5. The analyst did not use the CREATE TABLE AS SELECT (CTAS) command to create the table.

Correct Answer: 1

Athena creates metadata only when a table is created. The data is parsed only when you run the query. If your table has defined partitions, the partitions might not yet be loaded into the AWS Glue Data Catalog or the internal Athena data catalog.

You can use MSCK REPAIR TABLE or ALTER TABLE ADD PARTITION to load the partition information into the catalog. The former only works for Hive-style formats (ex: /year=2021/month=01/day=01/myfile.csv) while the latter works for both Hive and non-Hive-style formats. Furthermore, the MSCK REPAIR TABLE scans Amazon S3 and automatically adds new partitions to the catalog. ALTER TABLE ADD PARTITION on the other hand, allows you to manually add a specific partition/s.
 

Hence, the correct answer is: The analyst did not run the MSCK REPAIR TABLE command after partitioning the data.

The option that says: The analyst forgot to run the MSCK REPAIR TABLE command before partitioning the data is incorrect because this command should be run after adding a partition, not before.

The option that says: The S3 bucket where the sample data is stored has insufficient read permissions is incorrect. This is not true because the data was successfully loaded into Athena when the table was created.

The option that says: The analyst did not use the CREATE TABLE AS SELECT (CTAS) command to create the table is incorrect. This command simply creates a table from the results of a SELECT statement from another query.

References:
https://docs.aws.amazon.com/athena/latest/ug/partitions.html 
https://aws.amazon.com/premiumsupport/knowledge-center/athena-create-use-partitioned-tables/
https://aws.amazon.com/premiumsupport/knowledge-center/athena-empty-results/

Check out this Amazon Athena Cheat Sheet:
https://tutorialsdojo.com/amazon-athena/

Click here for more AWS Certified Data Analytics Specialty practice exam questions.

Check out our other AWS practice test courses here:

 

Final Remarks

To understand a service at a higher level, we recommend that you get hands-on experience. A lot of questions in the exam try to validate whether you’ve seen a particular error or issue during your practice. To prepare yourself for the actual exam, you can use the AWS Free Tier account to simulate different scenarios. With the combination of theoretical and practical knowledge, you can pass the test with flying colors.

We hope that our guide has helped you achieve your goal, and we would love to hear back from you after your exam. Remember that the most important thing before the day of your exam is to get some well-deserved rest. Good luck, and we wish you all the best.

Tutorials Dojo portal

Be Inspired and Mentored with Cloud Career Journeys!

Tutorials Dojo portal

Enroll Now – Our Azure Certification Exam Reviewers

azure reviewers tutorials dojo

Enroll Now – Our Google Cloud Certification Exam Reviewers

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS Exam Readiness Digital Courses

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

FREE Intro to Cloud Computing for Beginners

FREE AWS, Azure, GCP Practice Test Samplers

Recent Posts

Written by: Jon Bonso

Jon Bonso is the co-founder of Tutorials Dojo, an EdTech startup and an AWS Digital Training Partner that provides high-quality educational materials in the cloud computing space. He graduated from Mapúa Institute of Technology in 2007 with a bachelor's degree in Information Technology. Jon holds 10 AWS Certifications and is also an active AWS Community Builder since 2020.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?