- An interactive query service that makes it easy to analyze data directly in S3 using standard SQL.
- Athena is serverless.
- Has a built-in query editor.
- Uses Presto, an open source, distributed SQL query engine optimized for low latency, ad hoc analysis of data.
- Athena supports a wide variety of data formats such as CSV, JSON, ORC, Avro, or Parquet.
- Athena automatically executes queries in parallel, so that you get query results in seconds, even on large datasets.
- Athena uses Amazon S3 as its underlying data store, making your data highly available and durable.
- Athena integrates with Amazon QuickSight for easy data visualization.
- Athena integrates out-of-the-box with AWS Glue.
Athena uses a managed Data Catalog to store information and schemas about the databases and tables that you create for your data stored in S3.
- By partitioning your data, you can restrict the amount of data scanned by each query, thus improving performance and reducing cost.
- Athena leverages Hive for partitioning data.
- You can partition your data by any key.
- You can query geospatial data.
- You can query different kinds of logs as your datasets.
- Athena stores query results in S3.
- Athena retains query history for 45 days.
- Athena does not support user-defined functions, INSERT INTO statements, and stored procedures.
- Athena supports both simple data types such as INTEGER, DOUBLE, VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.
- Athena supports querying data in Amazon S3 Requester Pays buckets.
- Control access to your data by using IAM policies, access control lists, and S3 bucket policies.
- If the files in the target S3 bucket is encrypted, you can perform queries on the encrypted data itself.
- You pay only for the queries that you run. You are charged based on the amount of data scanned by each query.
- You are not charged for failed queries.
- You can get significant cost savings and performance gains by compressing, partitioning, or converting your data to a columnar format, because each of those operations reduces the amount of data that Athena needs to scan to execute a query.
AWS Knowledge Center Videos: How do I analyze my S3 logs using Athena?
Note: If you are studying for the AWS Certified Data Analytics Specialty exam, we highly recommend that you take our AWS Certified Data Analytics – Specialty Practice Exams and read our Data Analytics Specialty exam study guide.
Validate Your Knowledge
A multinational corporation is using Amazon Athena to analyze the data sets stored in Amazon S3. The Data Analyst needs to implement a solution that will control the maximum amount of data scanned in the S3 bucket and ensure that if the query exceeded the limit, all the succeeding queries will be canceled.
Which of the following approach can be used to fulfill this requirement?
- Set up a workload management (WLM) assignment rule in the primary workgroup.
- Set data limits in the per query data usage control.
- Integrate API Gateway with Amazon Athena. Configure an account-level throttling to control the queries in the S3 bucket.
- Create an IAM policy that will throttle the data limits in the primary workgroup.
A company is using Amazon Athena query with Amazon QuickSight to visualize the AWS CloudTrail logs. The Security Administrator created a custom Athena query that reads the CloudTrail logs and checks if there are IAM user accounts or credentials created in the past 29, 30 or 31 days (depending on the current month). However, the Administrator always gets an
Insufficient Permissions error whenever she tries to run the query from Amazon QuickSight.
What is the MOST suitable solution that the Administrator should do to fix this issue?
- Disable the Log File Integrity feature in AWS CloudTrail.
- Enable Cross-Origin Resource Sharing (CORS) in the S3 bucket that is used by Athena.
- Use the AWS Account Root User to run the Athena query from Amazon QuickSight.
- Make sure that Amazon QuickSight can access the S3 buckets used by Athena.