Amazon SageMaker Feature Store Cheat Sheet
- Amazon SageMaker Feature Store is a centralized repository for managing machine learning features.
- It simplifies the process of data exploration, model training, and batch predictions by providing a unified view of your features.
- Enhances ML model development and deployment efficiency.
How does it work?
- SageMaker Feature Store stores features in feature groups.
- A feature group is a collection of related features that can be used for a specific task.
- Feature groups can be created from various data sources, such as Amazon S3, Amazon RDS, and Amazon DynamoDB.
- There are three modes that a feature store offers::
- Online -provides low-latency feature access, making it suitable for high-throughput prediction applications.
- Offline -allows for batch processing of large datasets stored in the offline store. These datasets can be used for training models or performing batch inference. The offline store utilizes S3 for storage and supports data retrieval using Athena queries.
- Online and Offline – a combination of online and offline modes.
- It supports both streaming and batch data ingestion.Â
- Stream Ingestion
- Streaming features allow you to continuously push new or updated feature data to the store in real-time.
- This is done by using the synchronous
PutRecord
API, ensuring that the latest feature values are always available.
- Batch Ingestion
- Allows you to use tools like SageMaker Data Wrangler to create features and then export a notebook that can be used to ingest the features in batches into a feature group.
- This method supports both offline and online ingestion, depending on the configuration of the feature group.
- Stream Ingestion
Amazon SageMaker Feature Store Key Concepts
Here are some common terms used in the Amazon SageMaker Feature Store:
- Feature Store: A centralized repository for managing machine learning features, serving as the single source of truth for your data by storing, retrieving, removing, tracking, sharing, discovering, and controlling feature access.
- Online Store: Provides real-time access to the latest feature data, enabling low-latency applications.
- Offline Store: Stores historical feature data, often used for batch processing, offline analysis, or model training.
- Feature Group: A collection of related features used to describe a set of records, serving as the foundation for training and predicting with machine learning models.
- Feature: A property or characteristic used as input for machine learning models. In a Feature Store, a feature represents a column in your ML data table.
- Feature Definition:Â Specifies the name and data type (integral, string, or fractional) of a feature within a feature group.
- Record:Â A collection of feature values for a specific entity, uniquely identified by a combination of record identifier and event time. In a Feature Store, a record represents a row in your ML data table.
- Record Identifier:Â The name of the feature used to identify records within a feature group uniquely. It must be defined among the feature group’s feature definitions.
- Event Time: A timestamp associated with a record event, indicating when the data was captured or updated. All records in a feature group must have an event time. The online store only stores the latest record for each entity, while the offline store retains all historical records.
- Ingestion: The process of adding new records to a feature group, typically performed using the
PutRecord
API.
Benefits of using SageMaker Feature Store
- SageMaker Feature Store provides a central location for managing all of your machine learning features. This makes it easy to find, track, and update your features.
- SageMaker Feature Store offers features for lineage tracking and data validation that can help you enhance the quality of your data.
- SageMaker Feature Store can simplify the process of model training by providing a unified view of your features. This can help you to reduce the time it takes to train your models.
- SageMaker Feature Store can help to improve collaboration between data scientists and machine learning engineers by providing a shared view of features.
Amazon SageMaker Feature Store Use Cases
- SageMaker Feature Store can be used to explore your data and identify patterns and trends.
- SageMaker Feature Store can be used to train machine learning models.
- SageMaker Feature Store can be used to make batch predictions on new data.