AWS Analytics Services

AWS Glue


A fully managed service to extract, transform, and load (ETL) your data for analytics. Discover and search across different AWS data sets without moving your data. AWS Glue consists of: Central metadata repository ETL engine Flexible scheduler Use Cases Run queries against an Amazon S3 data lake You can use AWS Glue to make your data available for analytics without moving your data. Analyze the log data in your data warehouse Create ETL scripts to transform, flatten, and enrich the data from source to target. Create event-driven ETL pipelines As soon as new data becomes available in Amazon S3, you [...]

Kinesis Scaling, Resharding and Parallel Processing


Kinesis Resharding enables you to increase or decrease the number of shards in a stream in order to adapt to changes in the rate of data flowing through the stream. Resharding is always pairwise. You cannot split into more than two shards in a single operation, and you cannot merge more than two shards in a single operation. The Kinesis Client Library (KCL) tracks the shards in the stream using an Amazon DynamoDB table, and adapts to changes in the number of shards that result from resharding. When new shards are created as a result of resharding, the KCL discovers [...]

Amazon QuickSight


Amazon QuickSight is a cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data, anytime, on any device. Features Provides ML Insights for discovering hidden trends and outliers, identify key business drivers, and perform powerful what-if analysis and forecasting. Has a wide library of visualizations, charts, and tables; You can add interactive features like drill-downs and filters, and perform automatic data refreshes to build interactive dashboards. Allows you to schedule automatic email-based reports, so you can get key insights delivered to your inbox. QuickSight allows users to connect [...]

Amazon Elasticsearch (Amazon ES)


Amazon ES lets you search, analyze, and visualize your data in real-time. This service manages the capacity, scaling, patching, and administration of your Elasticsearch clusters for you, while still giving you direct access to the Elasticsearch APIs. The service offers open-source Elasticsearch APIs, managed Kibana, and integrations with Logstash and other AWS Services. This combination is often coined as the ELK Stack. Concepts An Amazon ES domain is synonymous with an Elasticsearch cluster. Domains are clusters with the settings, instance types, instance counts, and storage resources that you specify. You can create multiple Elasticsearch indices within the same domain. Elasticsearch [...]

Amazon Kinesis


Makes it easy to collect, process, and analyze real-time, streaming data. Kinesis can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. Kinesis Video Streams A fully managed AWS service that you can use to stream live video from devices to the AWS Cloud, or build applications for real-time video processing or batch-oriented video analytics. How it works Benefits You can connect and stream from millions of devices. You can configure your Kinesis video stream to durably store media data for custom retention periods. Kinesis Video Streams [...]

Amazon EMR


A managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. You can process data for analytics purposes and business intelligence workloads using EMR together with Apache Hive and Apache Pig. You can use EMR to transform and move large amounts of data into and out of other AWS data stores and databases. Features EMR Notebooks provide a managed environment, based on Jupyter Notebooks, to help users prepare and visualize data, collaborate with peers, build applications, and perform interactive analysis using EMR clusters. EMR [...]

AWS Data Pipeline


A web service for scheduling regular data movement and data processing activities in the AWS cloud. Data Pipeline integrates with on-premise and cloud-based storage systems. A managed ETL (Extract-Transform-Load) service. Native integration with S3, DynamoDB, RDS, EMR, EC2 and Redshift. Features You can quickly and easily provision pipelines that remove the development and maintenance effort required to manage your daily data operations, letting you focus on generating insights from that data. Data Pipeline provides built-in activities for common actions such as copying data between Amazon Amazon S3 and Amazon RDS, or running a query against Amazon S3 log data. Data [...]

Amazon CloudSearch


A fully-managed service in the AWS Cloud that makes it easy to set up, manage, and scale a search solution for your website or application. Features You can use CloudSearch to index and search both structured data and plain text. Full text search with language-specific text processing Boolean search Prefix searches Range searches Term boosting Faceting Highlighting Autocomplete Suggestions You can get search results in JSON or XML, sort and filter results based on field values, and sort results alphabetically, numerically, or according to custom expressions. CloudSearch can scale to accommodate the amount of data uploaded to the domain and [...]

Amazon Athena


An interactive query service that makes it easy to analyze data directly in S3 using standard SQL. Features Athena is serverless. Has a built-in query editor. Uses Presto, an open source, distributed SQL query engine optimized for low latency, ad hoc analysis of data. Athena supports a wide variety of data formats such as CSV, JSON, ORC, Avro, or Parquet. Athena automatically executes queries in parallel, so that you get query results in seconds, even on large datasets. Athena uses Amazon S3 as its underlying data store, making your data highly available and durable. Athena integrates with Amazon QuickSight for [...]

Amazon Redshift


A fully managed, petabyte-scale data warehouse service. Redshift extends data warehouse queries to your data lake. You can run analytic queries against petabytes of data stored locally in Redshift, and directly against exabytes of data stored in S3. RedShift is an OLAP type of DB. Currently, Redshift only supports Single-AZ deployments.  Features Redshift uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries. It uses a massively parallel processing data warehouse architecture to parallelize and distribute SQL operations. Redshift uses machine learning to deliver high throughput based on your workloads. Redshift uses [...]

