The AWS Certified Data Analytics – Specialty exam is intended for people who have experience in designing, building, securing, and maintaining analytics solutions on AWS. The exam will test your technical skills on how different AWS analytics services integrate with each other. You also need to know how they fit in the data lifecycle of collection, storage, processing, and visualization.
This specialty certification exam is on par with the other AWS Professional level tests so you need to allocate ample time for your preparation. With the help of the official exam study guide, you can determine the areas that you need to focus on. It will show you the specific knowledge areas and domains that you must review to pass the exam.
Before taking the actual exam, we recommend checking out these study materials for AWS Certified Data Analytics Specialty. These resources will help you understand the concepts and strategies that you will need for you to pass the exam.
- Free Exam Readiness: AWS Certified Data Analytics – Specialty – this is an interactive course that has responsive image maps, accordions, sample problem sets, section-based quizzes, and a practice test in the end.
- AWS FAQs – can help you grasp every service briefly. The responses you will find here are commonly asked questions, use cases, and comparison of various AWS services.
- Tutorials Dojo’s AWS Cheat Sheets – can help you understand the lengthy concepts found in the AWS FAQs. These cheat sheets are presented in a bullet point format to help you digest the information easily. This page summarizes all the analytics services of AWS.
- AWS Knowledge Center – you can use this website to find and understand the most frequent questions and requests AWS receives from its customers.
- AWS Documentation and Whitepapers – this document will help you expand your knowledge on various AWS services with its detailed information. You can focus on the following whitepapers:
- Amazon EMR Migration Guide: How to Move Apache Spark and Apache Hadoop From On-Premises to AWS
- Big Data Options on AWS
- Lambda Architecture for Batch and Stream Processing
- Streaming Data Solutions on AWS with Amazon Kinesis
- Teaching Big Data Skills with Amazon EMR
- Reference Architecture: SQL Based Data Processing in Amazon ECS
- Tutorials Dojo’s AWS Certified Data Analytics Specialty Practice Exams (coming soon!) – this provides a comprehensive reviewer with complete and detailed explanations to help you pass your AWS Data Analytics exam on your first try. The Tutorials Dojo practice exams are well-regarded as the best AWS practice test reviewers in the market.
AWS Services to Focus On
The AWS Certified Data Analytics Specialty has five domains: Collection, Storage and Data Management, Processing, Analysis and Visualization, and Security. To comprehend the different scenarios in the exam, you should have a thorough understanding of the following services:
- Amazon Athena – learn how you can analyze the data in the S3 bucket and how you can configure and optimize Athena’s performance.
- Amazon CloudSearch – know the use case and features of the service.
- Amazon Elasticsearch – learn how you can integrate Elasticsearch and Kibana in different AWS services.
- Amazon EMR – understand the security, hardware, and software configurations of the EMR cluster and how you can use AWS Glue Data Catalog for table metadata.
- Amazon Kinesis – know the use case of each Kinesis service (Data Streams, Data Firehose, and Data Analytics) and how they differ from each other.
- Amazon QuickSight – learn how you can integrate QuickSight into your solution, how you can publish dashboards, reports, analytics, and how you can refresh your datasets.
- Amazon Redshift – understand the different SQL commands, the use case of Redshift cluster, Redshift Spectrum, and how you can analyze the data in the data warehouse.
- AWS Data Pipeline – learn the concepts and components of the pipeline.
- AWS Glue – understand the concepts of the data catalog, crawlers, workflows, triggers, jobs, job bookmarks, and job metrics.
You must know how these services interact to develop a complete data analytics solution in AWS. Also, prepare to see various Apache technologies, such as Apache Parquet, ORC, Avro, Oozie, Sqooq, HBase, and many more.
Common Exam Scenarios
A near-real-time solution is needed that only collects non-confidential data from sensitive streaming data and stores it in durable storage.
Use Amazon Kinesis Data Firehose to ingest streaming data and enable record transformation to utilize AWS Lambda for excluding sensitive data. Store the processed data in Amazon S3.
Large files are compressed into a single GZIP file and uploaded into an S3 bucket. You have to speed up the
Split the GZIP file into smaller files and make sure that their number is a multiple of the number of the Redshift cluster’s slices.
An Amazon EMR cluster needs to use a centralized metadata layer that will expose data in Amazon S3 as tables.
Enable EMRFS consistent view.
Ways to fix Amazon Kinesis Data Streams throttling issues on write requests.
A company needs a cost-effective solution for detecting anomalous data coming from an Amazon Kinesis Data stream.
Create a Kinesis Data Analytics application and use the
|Storage and Data Management|
A company wants a cost-effective solution that will enable them to query a subset of data from a CSV file.
Use Amazon S3 Select
You need to populate a data catalog using data stored in Amazon S3, Amazon RDS, and Amazon DynamoDB.
Use an AWS Glue crawler schedule
A Data Analyst used the
What is a cost-effective solution to save Redshift query results to an external storage?
Use the Amazon Redshift
A company is using Amazon S3 Standard-IA and Amazon S3 Glacier as its data storage.
Some data cannot be accessed with Amazon Athena queries. Which best explains this event?
Amazon Athena is trying to access data stored in Amazon S3 Glacier.
A company uses an Amazon EMR cluster to process 10 batch jobs every day. Each job takes about 20 minutes to complete. A solution to lower down the cost of the EMR cluster must be implemented.
Use transient Amazon EMR clusters
An Amazon Kinesis Client Library (KCL) application is processing data in a DynamoDB table that has provisioned write capacity. The application’s latency increases during peak times and it must be resolved immediately.
Increase the DynamoDB tables’ write throughput.
Thousands of files are being loaded in a central fact table hosted on Amazon Redshift. You need to optimize the cluster resource utilization when loading data into the fact table.
Use a single COPY command to load data.
A Lambda function is used to process data from a Kinesis Data stream. Results are delivered into Amazon ES. During peak hours, the processing time slows down.
Use multiple Lambda function to prococess data concurrently.
A Data Analyst needs to join data stored in Amazon Redshift and data stored in Amazon S3. The Analyst wants a serverless solution that will reduce the workload of the Redshift cluster.
Create an external table using Amazon Redshift Spectrum for the S3 data and use Redshift SQL queries for join operations.
|Analysis and Visualization|
A company requires an out-of-the-box solution for visualizing complex real-world scenarios and forecasting trends.
Use ML-powered forecasting with Amazon QuickSight
A Data Analyst needs to use Amazon QuickSight for creating daily reports based on the dataset stored in Amazon S3.
Create a daily schedule refresh for the dataset.
A company has encountered an import into SPICE error after using Amazon QuickSight to query a new Amazon Athena table that is associated with a new S3 bucket.
Configure the correct permissions for the new S3 bucket from the QuickSight Console.
A company needs a cost-effetive solution for ad-hoc analyses and data visualizations.
Use Amazon Athena and Amazon QuickSight.
A company needs to visualize and analyze web logs in near-real time.
Use Amazon Kinesis Data Firehose to stream logs into Amazon Elasticsearch. Visualize logs using Kibana.
Root device volume encryption must be enabled on all nodes of an EMR cluster. AWS CloudFormation is required for creating new resources.
Create a custom AMI with encrypted root device volume and place the AMI ID under the CustomAmild property within the CloudFormation template.
A solution is needed to encrypt data stored in an EBS volume that is attached to an EMR cluster
Use Linux Unified Key Setup (LUKS).
A company is having trouble accessing data in a Redshift cluster using Amazon QuickSight.
Create a new inbound rule for the cluster’s security group that allows access from the IP address range that Amazon QuickSight uses.
A company wants to prevent any user from creating EMR clusters that is accessible from the public Internet.
Enable the ‘block public access’ setting in the Amazon EMR Console.
A company wants data in a Kinesis Data stream to be encrypted. The company wants to manage the key rotation.
Specify a Customer Master Key when enabling server-side encryption for the Kinesis Data stream.
Validate Your Knowledge
After you’ve reviewed the materials above, the next resource that you should check is the FREE AWS sample questions for AWS Data Analytics Specialty. Although this sample exam is not on the same level of difficulty as one might expect on the real exam, it is still a helpful resource for your review. Be sure to check the sample questionnaire often since AWS may upload a new version of it.
For high-quality practice exams, you can use our AWS Certified Data Analytics Specialty Practice Exams. These practice tests will help you boost your preparedness for the real exam. It contains multiple sets of questions that cover almost every area that you can expect from the real certification exam. We have also included detailed explanations and adequate reference links to help you understand why the option with the correct answer is better than the rest of the options. This is the value that you will get from our course. Practice exams are a great way to determine which areas you are weak in, and it will also highlight the important information that you might have missed during your review.
Sample Practice Test Questions:
A company provides insights into user behaviors of its social media platform using Amazon Athena. The Data Analysts from different teams run ad-hoc queries on the data stored on Amazon S3 buckets. However, some data contains sensitive information that must adhere to certain security policies. The query history and execution must be separated among different users and teams for compliance purposes.
Which of the following should be implemented to meet the above requirements?
- Set up an S3 bucket for each team and assign bucket policies that grant appropriate permissions to individual IAM users. Enable S3 server access logging on the buckets to store historical queries in another S3 bucket.
- Set up an Athena workgroup for each team and apply tags to each workgroup. Using these tags, grant appropriate permissions to the workgroup with IAM policies. Have the members use their assigned Athena workgroup.
- Set up an IAM group for each team, create an appropriate IAM policy to use Athena, and attach it to the IAM group. Add the individual users to the IAM group and use this permission to query on Athena.
- Set up an IAM group for each team, grant specific Athena permissions to each IAM group. Create an AWS Glue Data Catalog resource policy for each IAM group to record the Athena queries.
A digital marketing company uses Amazon DynamoDB and highly-available Amazon EC2 instances for one of its solutions. Its application logs are pushed to Amazon CloudWatch logs. The team of data analysts wants to enrich these logs with data from DynamoDB in near-real-time and use the output for further study.
Which among these steps will enable collection and enrichment based on the requirements stated above?
- Export the EC2 application logs to Amazon S3 on an hourly basis using AWS CLI. Use AWS Glue crawlers to catalog the logs. Configure an AWS Glue connection to the DynamoDB table and an AWS Glue ETL job to enrich the data. Store the enriched data in an Amazon S3 bucket.
- Write an AWS Lambda function that will enrich the data in the DynamoDB table. Create an Amazon Kinesis Data Firehose delivery stream, configure it to subscribe to Amazon CloudWatch Logs, and set an Amazon S3 bucket as its destination. Create a CloudWatch Logs subscription that sends log events to your delivery stream.
- Write an AWS Lambda function that will export the EC2 application logs to Amazon S3 on an hourly basis. Use Apache Spark SQL on Amazon EMR to read the logs from Amazon S3 and enrich the records with the data from DynamoDB. Store the enriched data in an Amazon S3 bucket.
- Install Amazon Kinesis Agent on the EC2 instance. Configure the application to write the logs in a local filesystem and configure Amazon Kinesis Agent to send the data to Amazon Kinesis Data Streams. Configure a Kinesis Data Analytics SQL application with the Kinesis data stream as the source and enrich it with data from the DynamoDB table. Store the enriched output stream in an Amazon S3 bucket using Amazon Kinesis Data Firehose.
Click here for more AWS Certified Data Analytics Specialty practice exam questions.
Check out our other AWS practice test courses here:
To understand a service at a higher level, we recommend that you get hands-on experience. A lot of questions in the exam try to validate whether you’ve seen a particular error or issue during your practice. To prepare yourself for the actual exam, you can use the AWS Free Tier account to simulate different scenarios. With the combination of theoretical and practical knowledge, you can pass the test with flying colors.
We hope that our guide has helped you achieve your goal, and we would love to hear back from you after your exam. Remember that the most important thing before the day of your exam is to get some well-deserved rest. Good luck, and we wish you all the best.