Last updated on April 27, 2023
Every year, professionals all around the world attend the most transformative tech event — AWS re:Invent.
Here, a LOT of new AWS services and capabilities are announced and discussed. In this post, we will focus on the major announcements relevant to data scientists and ML engineers!
A Gentle Introduction to Amazon SageMaker
The major announcements discussed in this post focus on SageMaker, so we will spend a paragraph or two quickly talking about the service.
Amazon SageMaker is the machine learning platform of AWS that helps solve a variety of requirements of data scientists, developers, and machine learning practitioners in the cloud. SageMaker has features and capabilities that support the different stages of the machine learning process, and it makes machine learning experiments and deployments in the cloud very practical and straightforward.
SageMaker allows a data scientist or a machine learning engineer to train and deploy a machine learning model in just a couple of minutes. In addition to this, only a few lines of code are needed to get started. If a data scientist wants to enable model monitoring in production (to check if the deployed machine learning model has degraded), adding a few lines of code would do the trick!
New Features and Capabilities Announced
Now that we have a good idea of what SageMaker is, let’s now proceed with some of the announced new features and capabilities of SageMaker. Note that we won’t discuss all announcements here since we’ll focus on several key announcements discussed in the sessions.
Let’s start with SageMaker Role Manager. SageMaker Role Manager plays a crucial role in ML Governance. It helps set up permissions and roles properly to limit the access of users to only what is needed on SageMaker resources. Existing role templates are available to speed up the process, but custom roles can be prepared as well to implement the principle of least privilege. With this new capability in place, ML engineers and AWS administrators are able to improve the security configuration setup quickly and easily.
Another capability relevant to ML Governance would be SageMaker Model Cards. The SageMaker Model Cards capability helps with model information gathering to speed up the documentation and reporting processes. In some companies, models need to be audited and analyzed first before being deployed to production. This is where SageMaker Model Cards shine, as it allows data scientists to quickly generate reports by extracting the much-needed details of the generated models.
The 3rd one on our list would be the SageMaker Model Dashboard. Using this capability, we can easily list our registered models in a central interface and keep track of the Risk Rating, Model Quality, Data Quality, Bias Drift, and Feature Attribution Drift details. If the deployed machine learning model starts to experience some sort of drift or issue, the SageMaker Model Dashboard can be used to help detect and troubleshoot the issue.
The 4th one in our list would be the SageMaker Pipelines Local Mode capability. Before the announcement of this capability, building MLOps pipelines using SageMaker Pipelines took a bit of time since it involved running real infrastructure resources even if the pipeline was still being developed and tested. Now, ML engineers would find it easier and faster to build ML pipelines using SageMaker Pipelines since the local mode support is available.
Another great addition to SageMaker Pipelines is the support for the cross-account sharing of the ML pipelines. This allows pipeline resources and entities to be shared across multiple AWS accounts, which helps make multi-account management more convenient and practical.
The next one is the AutoML support of SageMaker Pipelines. AutoML allows data scientists and ML engineers to utilize automation strategies to build multiple models at the same time (from a variety of model families) and choose the best model from the resulting list of models. This new support gives SageMaker Pipelines the needed boost through the addition of an AutoML step in the MLOps pipeline.
The last capability we’ll include in this list is the capability to convert SageMaker Studio notebook code to production-ready ML jobs. This means that instead of having to spend time converting manually-run Jupyter notebook code to a containerized job, a data scientist or an ML engineer can easily convert the notebook into an automated job inside Sagemaker Studio. This is definitely an amazing time saver as instead of hours (or days) automating a task, we can use this capability and get the job done in a matter of minutes!
Note that there are more announcements than what’s listed here, but this list should already give you a good understanding of the new features available to help you in your ML projects and requirements.
What’s next?
If you want to dig deeper into what Amazon SageMaker can do, feel free to check the 762-page book I’ve written here:
Machine Learning with Amazon SageMaker Cookbook: 80 proven recipes for data scientists and developers to perform machine learning experiments and deployments. Working on the hands-on solutions in this book will make you an advanced ML practitioner using SageMaker in no time. You should find all the other features and capabilities of SageMaker, such as SageMaker Clarify, SageMaker Model Monitor, and SageMaker Debugger, here as well.
That’s all for now and stay tuned for more!