Amazon SageMaker Clarify Cheat Sheet
- Amazon SageMaker Clarify is a SageMaker AI feature for detecting bias and explaining model predictions.
- Supports both pre-training and post-training bias analysis.
- Provides feature attribution to explain how input features influence predictions.
- Can monitor deployed models for bias drift and feature attribution drift over time.
Key Capabilities
Bias Detection
- Pre-training bias: Analyzes datasets before model training.
- Post-training bias: Evaluates model predictions for fairness across facets.
- Supports binary, multiclass, and regression tasks.
Interpreting Model Behavior
- Offers feature attributions via SHAP (SHapley Additive exPlanations), Partial Dependence Plots (PDP), etc.
- Explains individual predictions and global feature importance.
- Works with both tabular and text data.
Monitoring
- Detects bias drift and feature attribution drift in real-time.
- Integrates with SageMaker Model Monitor for continuous evaluation.
Integrations
- Integration with SageMaker Autopilot: Clarify-based explanations for AutoML models.
- Integration with SageMaker Data Wrangler— helps address detected bias through data balancing techniques (undersampling/oversampling/SMOTE).
Core Components
Configuration Objects
DataConfig
: Specifies the source dataset and the destination path for output artifacts.ModelConfig
: Identifies the model container or endpoint to be evaluated during the analysis.BiasConfig
: Facets and label information for bias analysis.SHAPConfig
: Parameters for SHAP-based explainability.ModelPredictedLabelConfig
: Specifies how to extract predicted labels.
Processing Job Setup
- Use
SageMakerClarifyProcessor
in SageMaker SDK. - Define
ProcessingInput
andProcessingOutput
. - Launch via
run()
method with all config objects.
Configuration Components
Analysis Configuration File (JSON)
- Defines bias or explainability parameters.
- Supports CSV, JSON Lines, and JSON datasets.
- Compatible with tabular, text, image, and time-series data.
SageMakerClarifyProcessor (Python SDK)
- High-level API to run Clarify jobs.
- Key methods:
run_bias_and_explainability
run_post_training_bias
run_explainability
(SHAP / PDP)- Supports combined SHAP + PDP jobs.
Bias Metrics Overview
Pre-training Bias Metrics
- Class Imbalance: Distribution of labels across facets.
- Differential Validity: Accuracy differences across groups.
Post-training Bias Metrics
- Disparate Impact: Ratio of favorable outcomes between groups.
- Equal Opportunity: True positive rate parity.
- Predictive Parity: Positive predictive value parity.
- Overall Accuracy Equality: Accuracy parity across groups.
SHAP Explainability
- Computes local explanations for each prediction.
- Aggregates to global feature importance.
- Outputs include:
- SHAP values per feature
- Summary plots
- Feature importance rankings
Validate Your Knowledge
Question 1
A retail company leverages machine learning models to predict quarterly sales and optimize inventory management. In response to stakeholder requests, the data science team has been tasked with providing a comprehensive report that ensures transparency and explains the rationale behind the models’ decisions.
What should the data science team present to clearly explain the model’s recommendation process?
- Hyperparameter tuning results
- Partial dependence plots (PDPs)
- Feature engineering scripts
- Model convergence tables
Amazon SageMaker Clarify Cheat Sheet Resources:
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-configure-processing-jobs.html
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-shapley-values.html
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-processing-job-configure-parameters.html
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-processing-job-run.html