Last updated on July 16, 2024
Amazon MSK Cheat Sheet
-
A service that uses fully managed Apache Kafka to ingest and process streaming data in real-time.
Concepts
-
Configuration
-
If you do not specify a custom MSK configuration, a default configuration will be assigned to a cluster.
-
You can use the custom configuration to new or existing MSK clusters.
-
MSK configurations allow you to specify the properties to be set as well as the values to be assigned to them.
-
-
MSK Serverless
-
A cluster type that enables you to run Apache Kafka without the need to manage or scale cluster capacity.
-
Automatically provision and scale capacity while managing the partitions in your topic.
-
Integrated with the following services:
-
AWS PrivateLink – provide private connectivity.
-
AWS IAM – for authentication and authorization.
-
AWS Glue Schema Registry – for schema management.
-
Amazon Kinesis Data Analytics – for Apache Flink-based stream processing.
-
AWS Lambda – for event processing.
-
-
To modify topic-level configuration, use Apache Kafka Commands.
-
-
MSK Connect
-
Enables you to stream data to and from Apache Kafka clusters.
-
Deploy connectors built for Kafka Connect that allow you to move data into or pull data from data stores (S3 and OpenSearch Service).
-
A connector continuously copies data from a streaming data source or from a cluster into a data sink.
-
Source connectors – import data from external systems into your topics.
-
Sink connectors – export data from your topics to external systems.
-
-
A worker is a JVM process that runs the connector logic.
-
Each worker creates a set of tasks that can operate in parallel threads and copy the data.
-
-
The total capacity of a connector is determined by the number of workers and the number of MSK Connect Units (MCUs) per worker.
-
The two capacity modes are:
-
Provisioned – number of workers and MCUs per worker.
-
Autoscaled – minimum and maximum number of workers.
-
-
A plugin contains the code that defines the logic of the connector. You can use the same plugin to create one or more connectors.
-
A configuration provider allows you to specify variables in a connector or worker configuration instead of plaintext, and workers running in your connector resolve these variables at runtime.
-
To allow Amazon MSK Connect to access the internet, you can use Amazon VPC and set up a NAT gateway or NAT instance.
-
-
Connecting to an Amazon MSK cluster
-
By default, an MSK cluster can only be accessed by clients who are in the same VPC as the cluster.
-
If you want to connect your MSK cluster from a client that’s outside the cluster’s VPC, you can do the following:
-
Turn on public access to a cluster.
-
Use VPC Peering, Direct Connect, Transit Gateway, VPN connections, REST proxies, multiple Region multi-VPC connectivity, and through EC2-Classic.
-
Use a number of ports that MSK uses.
-
-
-
The state of your cluster defines what actions you can and cannot perform.
-
You can migrate your clusters using Apache Kafka’s MirrorMaker.
-
Apache Kafka cluster to Amazon MSK
-
From one MSK cluster to another
-
-
With LinkedIn’s Cruise Control, you can rebalance the MSK cluster, detect and fix anomalies, and monitor the cluster’s state and health.
Amazon Managed Streaming Security
-
Use IAM to control who can perform Apache Kafka operations on a cluster.
-
If you add new brokers after changing a cluster’s security group, you must update the new brokers’ ENIs.
-
To limit access to Apache ZooKeeper nodes, you can just assign a separate security group.
Amazon Managed Streaming Monitoring
-
You can collect metrics, monitor, and analyze clusters using Amazon CloudWatch.
-
To monitor consumer lag and identify slow or stuck consumers, use CloudWatch or open monitoring with Prometheus.
-
You can deliver the Apache Kafka broker logs to the following destination types:
-
Amazon CloudWatch Logs
-
Amazon S3
-
Amazon Data Firehose
-
-
MSK Connector continuously monitors the following:
-
Connector health and delivery state.
-
Patches and manages the underlying hardware.
-
Autoscales connectors to match changes in throughput.
-
Amazon Managed Streaming Pricing
-
You are charged for the following:
-
Every Apache Kafka broker instance.
-
The amount of storage you provide in your cluster.
-
-
MSK Serverless charges you for cluster, partition, and storage.
-
For MSK Connect, you are charged for the number and size (MCUs) of each Kafka Connect worker.
Amazon MSK Cheat Sheet References:
https://aws.amazon.com/msk/
https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html