Deploying a Serverless Inference Endpoint with Amazon SageMaker

Home » AWS » Deploying a Serverless Inference Endpoint with Amazon SageMaker

Deploying a Serverless Inference Endpoint with Amazon SageMaker

Introduction

Welcome to our deep dive into the world of serverless machine learning (ML) inference using Amazon SageMaker. In this blog post, we will explore the innovative and efficient approach of deploying ML models without the need for managing servers, a method known as serverless inference.

What is Serverless Inference?

Serverless inference is a cloud computing execution model where the cloud provider dynamically manages the allocation of machine resources. The key advantage here is that it abstracts the underlying infrastructure, allowing developers and data scientists to focus solely on their application logic. This approach offers several benefits:

  • Cost-Effectiveness: You pay only for the resources your application consumes, eliminating the need for costly, idle compute resources.
  • Scalability: Serverless infrastructures can automatically scale to meet the demands of your application, ensuring efficient handling of varying loads without manual intervention.
  • Simplified Operations: With serverless, there’s no need to worry about server management, maintenance, or patching, reducing the operational overhead.

Amazon SageMaker: A Brief Overview

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. SageMaker takes away much of the heavy lifting and complexity involved in machine learning. It stands out in its ability to:

  • Offer a broad set of purpose-built tools for every step of the machine learning lifecycle.
  • Provide a robust and secure environment for deploying and managing models.
  • Integrate seamlessly with other AWS services, enhancing its functionality and flexibility.

Our Journey Today

In this post, we aim to guide you through the entire process of deploying a serverless inference using Amazon SageMaker. From setting up your environment and training your model to deploying it via a serverless endpoint, we will cover all the necessary steps. We will also discuss best practices, monitoring techniques, and efficient resource cleanup methods. Whether you’re new to SageMaker or looking to refine your skills, this post will provide valuable insights and practical knowledge.

Deploying a Serverless Inference Endpoint with Amazon SageMaker

Prerequisites and Setup

Before we dive into the exciting world of serverless inference with Amazon SageMaker, there are a few prerequisites and setup steps that we need to take care of. This preparation will ensure a smooth and efficient journey through the rest of the blog post.

Prerequisites

To begin, you’ll need:

  1. An AWS Account: Ensure you have access to an AWS account with the necessary permissions.
  2. Familiarity with Python: Basic knowledge of Python is essential, as our examples will use Python.
  3. Understanding of Machine Learning Concepts: A basic understanding of ML concepts will be beneficial.

Configuration of the Development Environment

Tutorials dojo strip

Setting up your development environment is a crucial step. Start by installing the necessary Python libraries:

!pip install sagemaker botocore boto3 awscli --upgrade

Then, set up your SageMaker and AWS clients:

import boto3

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime"

SageMaker Setup

Now, let’s set up SageMaker:

import boto3
import sagemaker
from sagemaker.estimator import Estimator

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)

sagemaker_session = sagemaker.Session()
base_job_prefix = "xgboost-example"
role = sagemaker.get_execution_role()
print(role)

default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix

training_instance_type = "ml.m5.xlarge"

Next, retrieve and upload the data to an S3 bucket:

s3 = boto3.client("s3")
s3.download_file(
    f"sagemaker-example-files-prod-{region}",
    "datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv",
    "abalone_dataset1_train.csv",
)

# upload data to S3
!aws s3 cp abalone_dataset1_train.csv s3://{default_bucket}/xgboost-regression/train.csv

With these steps, your environment is now ready for SageMaker model training and deployment.

In the next section, we will delve into model training using Amazon SageMaker.

Model Training

Training a machine learning model is a crucial part of any ML project. In this section, we’ll cover the steps to train a model using Amazon SageMaker, explore various training options, and share some best practices for optimal results.

Training a Machine Learning Model in SageMaker

Amazon SageMaker simplifies the process of training ML models. It provides a powerful environment that can handle different types and sizes of data, along with a broad array of machine learning algorithms.

  1. Choose Your Algorithm: SageMaker offers a variety of built-in algorithms, or you can bring your own.
  2. Prepare Your Data: Ensure your data is clean and in a format that is compatible with your chosen algorithm.
  3. Configure the Training Job: Set up the compute resources needed for training and specify the locations of your data.

Different Training Options in SageMaker

SageMaker provides multiple training options to suit different needs:

  • Built-in Algorithms: Use SageMaker’s pre-built algorithms for common tasks like classification, regression, etc.
  • Custom Algorithms: You can bring your custom algorithms packaged in Docker containers.
  • Managed Spot Training: Reduce training costs by using EC2 Spot Instances.

Best Practices for Model Training

  1. Data Preprocessing: Properly preprocess your data for effective training.
  2. Hyperparameter Tuning: Experiment with different hyperparameters for optimal model performance.
  3. Resource Management: Choose the right instance type for your training job to balance cost and speed.

Model Training Code

Let’s look at some code snippets for training a model in SageMaker:

from sagemaker.inputs import TrainingInput

# Define training data path
training_path = f"s3://{default_bucket}/xgboost-regression/train.csv"
train_input = TrainingInput(training_path, content_type="text/csv")

# Define model output path
model_path = f"s3://{default_bucket}/{s3_prefix}/xgb_model"

# Retrieve XGBoost image
image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type=training_instance_type,
)

# Configure Training Estimator
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type=training_instance_type,
    instance_count=1,
    output_path=model_path,
    sagemaker_session=sagemaker_session,
    role=role,
)

# Set Hyperparameters
xgb_train.set_hyperparameters(
    objective="reg:linear",
    num_round=50,
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.7,
    silent=0,
)

# Start the Training Job
xgb_train.fit({"train": train_input})

This code demonstrates setting up a training job with the XGBoost algorithm, configuring the estimator, and fitting the model with the training data.

If you are following this hands-on code tutorial, you can track your training job in your IDE of choice, or if you are using SageMaker Studio, you can also see something like this:

Deploying a Serverless Inference Endpoint with Amazon SageMaker

Alternatively, you can track the training job in the SageMaker console:

Deploying a Serverless Inference Endpoint with Amazon SageMaker

Deployment

Deploying a machine learning model is a critical step in putting your trained model into production. Amazon SageMaker simplifies this process, offering robust options for deploying models. In this section, we’ll explore how to create a model in SageMaker, set up a serverless endpoint, and compare serverless with traditional deployment options.

Model Creation in SageMaker

The first step in deploying your model is to create a model resource in SageMaker. This involves specifying the location of the model artifacts and the runtime configuration.

  1. Selecting Model Parameters: Choose the right parameters, including the framework version, instance type, and IAM role.
  2. Configuring the Model: Define the computational resources needed for your model, such as CPU, memory, and any custom configurations.

XGBoost Model Creation Code

from sagemaker.xgboost.model import XGBoostModel

# Create an XGBoost model in SageMaker
model = XGBoostModel(
    model_data=xgb_train.model_data, # Replace with your model artifacts path
    role=sagemaker.get_execution_role(), # Replace with your IAM role
    framework_version='1.0-1', # Specify the XGBoost framework version
)

Serverless Endpoint Configuration

Deploying your model to a serverless endpoint allows for a flexible, cost-effective, and scalable solution.

  1. Serverless Configuration: Configure your serverless endpoint with parameters like memory size and maximum concurrency.
  2. Benefits of Serverless Deployment: This approach is cost-effective (pay-as-you-go), automatically scalable, and requires no infrastructure management.
  3. Comparing with Traditional Deployment: Unlike traditional deployments, serverless endpoints do not require provisioning or managing servers, offering a more flexible and scalable approach.

Serverless Configuration Code

from sagemaker.serverless import ServerlessInferenceConfig

# Configure serverless inference
serverless_config = ServerlessInferenceConfig(
  memory_size_in_mb=4096,
  max_concurrency=5,
)

Deploying the Model

With the model and serverless configuration ready, you can deploy the model to a serverless endpoint.

from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

# Deploy the model as a serverless endpoint
predictor = model.deploy(
    instance_type='ml.m5.xlarge', 
    initial_instance_count=1,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
    serverless_inference_config=serverless_config
)

Deploying models sometimes takes a bit of time. If you are seeing the image below in your IDE, that is normal.

Deploying a Serverless Inference Endpoint with Amazon SageMaker

Once that is finished, you should see something like this:

Deploying a Serverless Inference Endpoint with Amazon SageMaker

Endpoint Invocation

You can invoke the endpoint by sending a request to it.

response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
    ContentType="text/csv",
)

print(response["Body"].read())

Free AWS Courses

A sample response would be:

Deploying a Serverless Inference Endpoint with Amazon SageMaker

Monitoring and Management

Effective monitoring and management of serverless inferences are essential for maintaining performance, managing costs, and ensuring your models remain up-to-date and efficient. Amazon SageMaker provides tools and techniques to help you in these areas.

Monitoring Serverless Inference Performance

  1. CloudWatch Metrics: Utilize Amazon CloudWatch to monitor metrics like invocation counts, errors, and latency. This gives you real-time insights into the performance of your serverless endpoint.
  2. Logging: Enable logging in SageMaker to capture detailed information about inference requests and responses. This is crucial for debugging and understanding model behavior in production.
  3. Alerts and Notifications: Set up alerts in CloudWatch for abnormal patterns or thresholds, ensuring proactive issue resolution.

Managing and Updating the Serverless Endpoint

  1. Endpoint Management: Regularly review and manage your endpoints in the SageMaker console. This includes scaling policies, instance types, and concurrency settings.
  2. Model Updates: To update your model, deploy a new version of the model to the endpoint. SageMaker allows seamless updates with minimal downtime.
  3. Versioning: Keep track of different model versions and configurations. This aids in rollback and performance comparison.

Cost Management and Optimization

  1. Cost Monitoring: Regularly monitor your AWS billing dashboard to keep track of the costs associated with your serverless endpoints.
  2. Optimize Concurrency: Adjust the maximum concurrency settings to balance cost and performance. This ensures you’re not over-provisioning resources.
  3. Use AWS Cost Explorer: Leverage AWS Cost Explorer to analyze and identify cost-saving opportunities. For instance, identifying underutilized resources.

By implementing these monitoring and management techniques, you can ensure that your serverless inferences in Amazon SageMaker run efficiently, cost-effectively, and remain reliable over time.

Clean Up

As we reach the conclusion of our journey with serverless inference in Amazon SageMaker, it’s crucial to discuss the clean-up process. Efficiently managing your AWS resources is not only a best practice but also helps in reducing unnecessary costs. Here’s how you can clean up the resources you’ve used.

Steps for Cleaning Up

Delete the Model:

Start by deleting the model you created in SageMaker. This removes the model artifacts and any associated resources.

client.delete_model(ModelName=model_name)

Delete the Endpoint Configuration

Next, delete the endpoint configuration. This action removes the configuration settings, freeing up the resources.

client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

Delete the Endpoint

Finally, delete the serverless endpoint. This is an important step as endpoints can incur ongoing charges.

client.delete_endpoint(EndpointName=endpoint_name)

Final Remarks

In this comprehensive guide, we journeyed through the exciting process of deploying serverless inferences using Amazon SageMaker. We began by understanding the landscape of serverless inference and the pivotal role of SageMaker in ML model deployment. Setting up the environment and ensuring all prerequisites were met laid the groundwork for our project.

We then delved into the heart of machine learning – training our model, where we discussed various options in SageMaker and adhered to best practices for optimal results. The deployment phase brought our trained model to life, illustrating the ease and efficiency of creating models and setting up serverless endpoints in SageMaker.

Monitoring and managing our serverless deployment was our next focus, ensuring performance, cost-effectiveness, and up-to-date model management. Finally, we emphasized the importance of cleaning up AWS resources to maintain a cost-effective and optimized cloud environment.

Throughout this journey, we highlighted the seamless integration, scalability, and cost benefits of using Amazon SageMaker for serverless inferences. Whether you’re a seasoned data scientist or new to machine learning, the insights and steps provided in this guide aim to equip you with the knowledge to successfully deploy your own serverless ML models.

As we conclude, remember that the field of machine learning and cloud computing is ever-evolving. Continuous learning and experimentation are key to staying ahead. We hope this guide has been a valuable resource in your ML endeavors with Amazon SageMaker. Happy modeling!

Resources:

https://docs.aws.amazon.com/sagemaker/

https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS/blob/main/chapter07/03%20-%20Deploying%20a%20serverless%20inference%20endpoint.ipynb

Tutorials Dojo portal

Level-Up Your Career this 2025

Learn AWS with our PlayCloud Hands-On Labs

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS Exam Readiness Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

Follow Us On Linkedin

Recent Posts

Written by: John Patrick Laurel

Pats is the Head of Data Science at a European short-stay real estate business group. He boasts a diverse skill set in the realm of data and AI, encompassing Machine Learning Engineering, Data Engineering, and Analytics. Additionally, he serves as a Data Science Mentor at Eskwelabs. Outside of work, he enjoys taking long walks and reading.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?