Amazon MSK Cheat Sheet

Last updated on July 16, 2024

Bookmarks

Concepts
Security
Monitoring
Pricing

Amazon MSK Cheat Sheet

A service that uses fully managed Apache Kafka to ingest and process streaming data in real-time.

Concepts

Configuration
- If you do not specify a custom MSK configuration, a default configuration will be assigned to a cluster.
- You can use the custom configuration to new or existing MSK clusters.
- MSK configurations allow you to specify the properties to be set as well as the values to be assigned to them.
MSK Serverless
- A cluster type that enables you to run Apache Kafka without the need to manage or scale cluster capacity.
- Automatically provision and scale capacity while managing the partitions in your topic.
- Integrated with the following services:
  - AWS PrivateLink – provide private connectivity.
  - AWS IAM – for authentication and authorization.
  - AWS Glue Schema Registry – for schema management.
  - Amazon Kinesis Data Analytics – for Apache Flink-based stream processing.
  - AWS Lambda – for event processing.
- To modify topic-level configuration, use Apache Kafka Commands.

MSK Connect
- Enables you to stream data to and from Apache Kafka clusters.
- Deploy connectors built for Kafka Connect that allow you to move data into or pull data from data stores (S3 and OpenSearch Service).
- A connector continuously copies data from a streaming data source or from a cluster into a data sink.
  - Source connectors – import data from external systems into your topics.
  - Sink connectors – export data from your topics to external systems.
- A worker is a JVM process that runs the connector logic.
  - Each worker creates a set of tasks that can operate in parallel threads and copy the data.
- The total capacity of a connector is determined by the number of workers and the number of MSK Connect Units (MCUs) per worker.
- The two capacity modes are:
  - Provisioned – number of workers and MCUs per worker.
  - Autoscaled – minimum and maximum number of workers.
- A plugin contains the code that defines the logic of the connector. You can use the same plugin to create one or more connectors.
- A configuration provider allows you to specify variables in a connector or worker configuration instead of plaintext, and workers running in your connector resolve these variables at runtime.
- To allow Amazon MSK Connect to access the internet, you can use Amazon VPC and set up a NAT gateway or NAT instance.
Connecting to an Amazon MSK cluster
- By default, an MSK cluster can only be accessed by clients who are in the same VPC as the cluster.
- If you want to connect your MSK cluster from a client that’s outside the cluster’s VPC, you can do the following:
  - Turn on public access to a cluster.
  - Use VPC Peering, Direct Connect, Transit Gateway, VPN connections, REST proxies, multiple Region multi-VPC connectivity, and through EC2-Classic.
  - Use a number of ports that MSK uses.
The state of your cluster defines what actions you can and cannot perform.
You can migrate your clusters using Apache Kafka’s MirrorMaker.
- Apache Kafka cluster to Amazon MSK
- From one MSK cluster to another
With LinkedIn’s Cruise Control, you can rebalance the MSK cluster, detect and fix anomalies, and monitor the cluster’s state and health.

Amazon Managed Streaming Security

Use IAM to control who can perform Apache Kafka operations on a cluster.
If you add new brokers after changing a cluster’s security group, you must update the new brokers’ ENIs.
To limit access to Apache ZooKeeper nodes, you can just assign a separate security group.

Amazon Managed Streaming Monitoring

You can collect metrics, monitor, and analyze clusters using Amazon CloudWatch.
To monitor consumer lag and identify slow or stuck consumers, use CloudWatch or open monitoring with Prometheus.
You can deliver the Apache Kafka broker logs to the following destination types:
- Amazon CloudWatch Logs
- Amazon S3
- Amazon Data Firehose
MSK Connector continuously monitors the following:
- Connector health and delivery state.
- Patches and manages the underlying hardware.
- Autoscales connectors to match changes in throughput.

Amazon Managed Streaming Pricing

You are charged for the following:
- Every Apache Kafka broker instance.
- The amount of storage you provide in your cluster.
MSK Serverless charges you for cluster, partition, and storage.
For MSK Connect, you are charged for the number and size (MCUs) of each Kafka Connect worker.