Welcome to our deep dive into the world of serverless machine learning (ML) inference using Amazon SageMaker. In this blog post, we will explore the innovative and efficient approach of deploying ML models without the need for managing servers, a method known as serverless inference. Serverless inference is a cloud computing execution model where the cloud provider dynamically manages the allocation of machine resources. The key advantage here is that it abstracts the underlying infrastructure, allowing developers and data scientists to focus solely on their application logic. This approach offers several benefits: Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. SageMaker takes away much of the heavy lifting and complexity involved in machine learning. It stands out in its ability to: In this post, we aim to guide you through the entire process of deploying a serverless inference using Amazon SageMaker. From setting up your environment and training your model to deploying it via a serverless endpoint, we will cover all the necessary steps. We will also discuss best practices, monitoring techniques, and efficient resource cleanup methods. Whether you’re new to SageMaker or looking to refine your skills, this post will provide valuable insights and practical knowledge. Before we dive into the exciting world of serverless inference with Amazon SageMaker, there are a few prerequisites and setup steps that we need to take care of. This preparation will ensure a smooth and efficient journey through the rest of the blog post. To begin, you’ll need: Setting up your development environment is a crucial step. Start by installing the necessary Python libraries: Then, set up your SageMaker and AWS clients: Now, let’s set up SageMaker: Next, retrieve and upload the data to an S3 bucket: With these steps, your environment is now ready for SageMaker model training and deployment. In the next section, we will delve into model training using Amazon SageMaker. Training a machine learning model is a crucial part of any ML project. In this section, we’ll cover the steps to train a model using Amazon SageMaker, explore various training options, and share some best practices for optimal results. Amazon SageMaker simplifies the process of training ML models. It provides a powerful environment that can handle different types and sizes of data, along with a broad array of machine learning algorithms. SageMaker provides multiple training options to suit different needs: Let’s look at some code snippets for training a model in SageMaker: from sagemaker.inputs import TrainingInput This code demonstrates setting up a training job with the XGBoost algorithm, configuring the estimator, and fitting the model with the training data. If you are following this hands-on code tutorial, you can track your training job in your IDE of choice, or if you are using SageMaker Studio, you can also see something like this: Alternatively, you can track the training job in the SageMaker console: Deploying a machine learning model is a critical step in putting your trained model into production. Amazon SageMaker simplifies this process, offering robust options for deploying models. In this section, we’ll explore how to create a model in SageMaker, set up a serverless endpoint, and compare serverless with traditional deployment options. The first step in deploying your model is to create a model resource in SageMaker. This involves specifying the location of the model artifacts and the runtime configuration. Deploying your model to a serverless endpoint allows for a flexible, cost-effective, and scalable solution. With the model and serverless configuration ready, you can deploy the model to a serverless endpoint. Deploying models sometimes takes a bit of time. If you are seeing the image below in your IDE, that is normal. Once that is finished, you should see something like this: You can invoke the endpoint by sending a request to it. A sample response would be: Effective monitoring and management of serverless inferences are essential for maintaining performance, managing costs, and ensuring your models remain up-to-date and efficient. Amazon SageMaker provides tools and techniques to help you in these areas. By implementing these monitoring and management techniques, you can ensure that your serverless inferences in Amazon SageMaker run efficiently, cost-effectively, and remain reliable over time. As we reach the conclusion of our journey with serverless inference in Amazon SageMaker, it’s crucial to discuss the clean-up process. Efficiently managing your AWS resources is not only a best practice but also helps in reducing unnecessary costs. Here’s how you can clean up the resources you’ve used. Start by deleting the model you created in SageMaker. This removes the model artifacts and any associated resources. Next, delete the endpoint configuration. This action removes the configuration settings, freeing up the resources. Finally, delete the serverless endpoint. This is an important step as endpoints can incur ongoing charges. In this comprehensive guide, we journeyed through the exciting process of deploying serverless inferences using Amazon SageMaker. We began by understanding the landscape of serverless inference and the pivotal role of SageMaker in ML model deployment. Setting up the environment and ensuring all prerequisites were met laid the groundwork for our project. We then delved into the heart of machine learning – training our model, where we discussed various options in SageMaker and adhered to best practices for optimal results. The deployment phase brought our trained model to life, illustrating the ease and efficiency of creating models and setting up serverless endpoints in SageMaker. Monitoring and managing our serverless deployment was our next focus, ensuring performance, cost-effectiveness, and up-to-date model management. Finally, we emphasized the importance of cleaning up AWS resources to maintain a cost-effective and optimized cloud environment. Throughout this journey, we highlighted the seamless integration, scalability, and cost benefits of using Amazon SageMaker for serverless inferences. Whether you’re a seasoned data scientist or new to machine learning, the insights and steps provided in this guide aim to equip you with the knowledge to successfully deploy your own serverless ML models. As we conclude, remember that the field of machine learning and cloud computing is ever-evolving. Continuous learning and experimentation are key to staying ahead. We hope this guide has been a valuable resource in your ML endeavors with Amazon SageMaker. Happy modeling! https://docs.aws.amazon.com/sagemaker/ https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.htmlIntroduction
What is Serverless Inference?
Amazon SageMaker: A Brief Overview
Our Journey Today
Prerequisites and Setup
Prerequisites
Configuration of the Development Environment
!pip install sagemaker botocore boto3 awscli --upgrade
import boto3
client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime"
SageMaker Setup
import boto3
import sagemaker
from sagemaker.estimator import Estimator
boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)
sagemaker_session = sagemaker.Session()
base_job_prefix = "xgboost-example"
role = sagemaker.get_execution_role()
print(role)
default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix
training_instance_type = "ml.m5.xlarge"
s3 = boto3.client("s3")
s3.download_file(
f"sagemaker-example-files-prod-{region}",
"datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv",
"abalone_dataset1_train.csv",
)
# upload data to S3
!aws s3 cp abalone_dataset1_train.csv s3://{default_bucket}/xgboost-regression/train.csv
Model Training
Training a Machine Learning Model in SageMaker
Different Training Options in SageMaker
Best Practices for Model Training
Model Training Code
# Define training data path
training_path = f"s3://{default_bucket}/xgboost-regression/train.csv"
train_input = TrainingInput(training_path, content_type="text/csv")
# Define model output path
model_path = f"s3://{default_bucket}/{s3_prefix}/xgb_model"
# Retrieve XGBoost image
image_uri = sagemaker.image_uris.retrieve(
framework="xgboost",
region=region,
version="1.0-1",
py_version="py3",
instance_type=training_instance_type,
)
# Configure Training Estimator
xgb_train = Estimator(
image_uri=image_uri,
instance_type=training_instance_type,
instance_count=1,
output_path=model_path,
sagemaker_session=sagemaker_session,
role=role,
)
# Set Hyperparameters
xgb_train.set_hyperparameters(
objective="reg:linear",
num_round=50,
max_depth=5,
eta=0.2,
gamma=4,
min_child_weight=6,
subsample=0.7,
silent=0,
)
# Start the Training Job
xgb_train.fit({"train": train_input})
Deployment
Model Creation in SageMaker
XGBoost Model Creation Code
from sagemaker.xgboost.model import XGBoostModel
# Create an XGBoost model in SageMaker
model = XGBoostModel(
model_data=xgb_train.model_data, # Replace with your model artifacts path
role=sagemaker.get_execution_role(), # Replace with your IAM role
framework_version='1.0-1', # Specify the XGBoost framework version
)
Serverless Endpoint Configuration
Serverless Configuration Code
from sagemaker.serverless import ServerlessInferenceConfig
# Configure serverless inference
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=4096,
max_concurrency=5,
)
Deploying the Model
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
# Deploy the model as a serverless endpoint
predictor = model.deploy(
instance_type='ml.m5.xlarge',
initial_instance_count=1,
serializer=JSONSerializer(),
deserializer=JSONDeserializer(),
serverless_inference_config=serverless_config
)
Endpoint Invocation
response = runtime.invoke_endpoint(
EndpointName=endpoint_name,
Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
ContentType="text/csv",
)
print(response["Body"].read())
Monitoring and Management
Monitoring Serverless Inference Performance
Managing and Updating the Serverless Endpoint
Cost Management and Optimization
Clean Up
Steps for Cleaning Up
Delete the Model:
client.delete_model(ModelName=model_name)
Delete the Endpoint Configuration
client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
Delete the Endpoint
client.delete_endpoint(EndpointName=endpoint_name)
Final Remarks
Resources:
Deploying a Serverless Inference Endpoint with Amazon SageMaker
AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!
Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!
View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE coursesOur Community
~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.