Personal ML Projects with Amazon SageMaker, Amazon Comprehend, Amazon Forecast and Other ML Services

Last updated on November 30, 2023

Machine learning and artificial intelligence have been powering many of the technologies we use daily, some of which we may not actively pay attention to, and they have become second nature to us. Suppose we actively look for the presence of ML/AI. In that case, we can find them everywhere: natural language processing in our AI Assistants, recommender engines in e-commerce, social media, and music, and fraud detection in finance, among many other technologies. Although these very powerful models are the ones running the digital world we live in, we can replicate the functionalities of said models for our uses, such as personal projects, with industry-level performance.

To drive the functionalities even further, we can utilize the suite of services offered by Amazon Web Services. With AWS, the capabilities of ML and AI are elevated to new heights, enabling us to tackle complex tasks easily.

Dev Environments, Empowered with Amazon SageMaker

Personal machine learning projects generally follow a standard flow encompassing the majority of considerations, decisions, and steps to be made during the project’s lifetime. Below is just one of these flows centered around the ethical, technical, and mathematical concepts of machine learning.

Machine learning projects usually start within the local developer environment, running on the user’s local device and utilizing the hardware capabilities and operating system it has been made to run on. However, this approach may introduce a bottleneck in hardware computing capabilities that can affect several pipeline stages. Given an extensive dataset with enough hyperparameters to be tested, training times could easily reach hours, drastically limiting efficiency and productivity.

One of the fastest and most effective ways to circumvent this is to use the fully managed machine learning service from AWS, Amazon SageMaker.

Amazon SageMaker, in a nutshell

Amazon SageMaker is a fully managed service where users can build, train, and deploy machine learning models quickly, with tools that can aid almost every step of the process. Some of the most notable features of SageMaker for our contexts include the following:

SageMaker JumpStart can accelerate the training workflow by providing pre-trained, open-source models that cover various use cases, such as NLP, Personalized Recommendations, and Churn Prediction, among others.
SageMaker Data Wrangler is an interface for conducting data preprocessing steps like data cleaning and transformation, and it can eliminate the need for intensive coding. It also provides quick data analysis features for an overview of the data.
SageMaker Studio is a fully integrated development environment for machine learning, where users can run all the machine learning pipeline stages. Powered with the entire stack of SageMaker and with the familiarity of JupyterLab, SageMaker Studio is an end-to-end, versatile, and comprehensive platform for machine learning.

Using Studio Notebooks for Data Science

Part of the SageMaker Studio is Studio Notebooks, an easy-to-launch and collaborative version of the traditional notebooks used in machine learning, equipped with persistent storage, an Amazon EC2 instance type (to provide the compute power), and a SageMaker image for containerization and to prepare ready-to-use environments.

To access studio notebooks, navigate to:

Amazon Sagemaker

> Domains (create one if there are none yet)

> Select your domain

> Launch

> Studio

Once the interface loads, we will be redirected to the dashboard for Amazon SageMaker Studio.

Amazon SageMaker Studio has countless other features that can be accessed here, way beyond the scope of this article. Still, for a starting point with a sense of familiarity, we can:

1. Boot up the launcher.

2. Create a notebook.

3. Work on any data science project with the style of a notebook and with a configurable EC2 instance that the user can change in the top-right configuration area.

These examples are only the surface of the capabilities of Studio Notebooks, but in essence, SageMaker is a highly flexible and innovative approach to machine learning. We will discuss the other features of Amazon SageMaker further in future articles.

Natural Language Processing with Amazon Comprehend

Apart from versatile and multi-use services like SageMaker, Amazon also provides services that cater to specific use cases by providing ultra-powerful, high-level pre-trained models. One of these services is Amazon Comprehend, which uses NLP to extract valuable insights from documents.

Comprehend Capabilities

Common use cases of Amazon Comprehend include sentiment analysis, entity recognition, and language detection. In addition to sentiment analysis, entity recognition, and language detection, Amazon Comprehend also offers document categorization and keyphrase extraction, making it a versatile tool for various natural language processing tasks.

Using the SDK

Amazon Comprehend can be accessed programmatically using the AWS SDK, available in multiple programming languages, but in our case, we will be using Python (boto3). By initiating a client, we can now use Comprehend as if we were using it in the console.

import boto3

# Initialize the comprehend client
comprehend = boto3.client('comprehend')

We can now call methods upon `comprehend` to call the services. For instance, to detect sentiments, we can call `detect_sentiment` and pass in the text we want to analyze, as well as some parameters.

# Provide the text to be analyzed
text = "I love using AWS for my projects!"

# Call the detect_sentiment method
response = comprehend.detect_sentiment(Text=text, LanguageCode='en')

This then returns a JSON containing the sentiment scores of the text, as well as some metadata.

{
'Sentiment': 'POSITIVE',
 'SentimentScore': 
{
'Positive': 0.994814932346344,
'Negative': 0.0002758224436547607,
'Neutral': 0.004847857169806957,
'Mixed': 6.13411029917188e-05
},

 'ResponseMetadata': 
{
'RequestId': '8878fd79-fb6d-4076-8f3f-7843dd0a55f7',
'HTTPStatusCode': 200,
 'HTTPHeaders': 
{
'x-amzn-requestid': '8878fd79-fb6d-4076-8f3f-7843dd0a55f7',
  			'content-type': 'application/x-amz-json-1.1',
  			'content-length': '163',
  		 	'date': 'Wed, 22 Nov 2023 09:05:22 GMT'
},

'RetryAttempts': 0
}
}

Building up from this simple test, we can develop purpose-built projects or functionalities that can help generate useful insight or drive decisions. An example project would be a product review study system where product reviews can be analyzed to identify the main selling points and address the pain points of users of a product. An example iteration of this can be attained with the following code:

import boto3
import json

# Initialize the Comprehend client
comprehend = boto3.client(service_name='comprehend')

# Input the reviews as a list of sentences
reviews = …


# Loop for every review
for review in reviews:
    # Detecting key phrases
    key_phrases_response = comprehend.detect_key_phrases(Text=review, LanguageCode='en')
    key_phrases = [phrase['Text'] for phrase in key_phrases_response['KeyPhrases']]

    print(f"Review: {review}")
    print("Key Phrases and their Sentiments:")
    
    # Analyze sentiment for each key phrase
    for phrase in key_phrases:
        sentiment_response = comprehend.detect_sentiment(Text=phrase, LanguageCode='en')
        sentiment = sentiment_response['Sentiment']
        print(f" - {phrase}: {sentiment}")

    print("\n")

In essence, what this code snippet can do is extract the parts of the review that are direct comments on the parts of the product and provide the sentiment of the review onto the part. If we take the following example:

reviews = [“Great battery life, but I don't like that we have a blurry screen.”]

We can get an output like the following:

Review: Great battery life, but I don't like that we have a blurry screen.
Key Phrases, Sentiments:
 - Great battery life: POSITIVE
 - a blurry screen: NEGATIVE

We can run this code on all the reviews, parse the results, and generate insights on whatever product or service we are analyzing.

Time Series Forecasting with Amazon Forecast

Another specialized service from Amazon is Amazon Forecast, a service that uses machine learning to generate precise time-series predictions or, as the name implies, forecasts. The idea behind it is that given a dataset containing time series data, we can observe what happens to the prediction variable over time, and once we recognize a pattern, we can then use that to infer future values.

Forecast Capabilities

Forecast simplifies the process of (repeatedly) applying cutting-edge algorithms over multiple datasets to generate the most accurate predictions. Some of the common use cases of Amazon Forecast include inventory planning, and operational planning, among a few others.

Upon choosing the initial settings for our dataset group, Forecast then allows us to upload our data, and include more information regarding our dataset.

A valuable feature of Forecast is that it will enable multiple data sources for a prediction, such as metadata and other related time series, on top of the initial dataset, which allows for significantly improved predictions. This also includes AWS-managed data, like national holidays and weather, which could be rather complicated to have a dedicated source of, thus simplifying the process even further.

Amazon Forecast automates much of the data preprocessing and feature engineering, crucial in time series forecasting. This can lead to more accurate models, as it efficiently handles complexities like missing values, outliers, and variable transformations.

Another powerful feature of Forecast is AutoML, which can automatically select the best algorithm and tune its hyperparameters, simplifying the model development process by reducing the need for manual model selection and optimization. Forecast uses state-of-the-art models and is already significantly more sophisticated than traditional models; having this feature can allow for less difficulty in the training process.

After the major improvements in the flow of data preparation and model training, Forecast also provides a simpler and easier extraction of predictions.

The insights from the predictions can now be used for our use cases, which, in this case, can predict the demand for a store in the upcoming months.

Final Remarks

ML and AI are the driving forces of our world right now, and fortunately, the ability to harness these forces is well within our reach as individuals, professionals, and enthusiasts. There are endless ways to practice ML, but expanding towards cloud computing is becoming more and more popular – and after showing what we can do with the cloud, I would say that the popularity is well deserved!

There are endless projects where the services and capabilities I mentioned can be integrated, from small-scale practice projects to company-level features shipped to clients. The list of things we do still grow as more and more features are rolled onto these services.

Although the features I have shown are very powerful, it is also important to be cautious about using them, as it is a service, and it does entail a cost. In my experience, Forecast tends to be the most expensive due to the computing power it necessitates, but cautiousness should be exercised at ALL times.

And lastly, I want to thank you for taking the time to read this article. Happy learning!

Resources:

https://aws.amazon.com/pm/sagemaker/

https://aws.amazon.com/comprehend/

https://aws.amazon.com/forecast/

Written by: Lesmon Andres Lenin Saluta

Lesmon is a data practitioner and currently the Product Data Scientist at Angkas. He oversees the development and innovation of product data analysis and modeling towards impactful solutions and to make better-informed business decisions. He also has a genuine passion for mentoring. He is a Data Science Fellowship Mentor at Eskwelabs, where he imparts knowledge and nurtures the next generation of Data Practitioners. Outside of work, Lesmon is a freshman at the University of the Philippines - Diliman, a scholar taking up a degree in BS Computer Science.

Personal ML Projects with Amazon SageMaker, Amazon Comprehend, Amazon Forecast and Other ML Services

Personal ML Projects with Amazon SageMaker, Amazon Comprehend, Amazon Forecast and Other ML Services

Dev Environments, Empowered with Amazon SageMaker

Amazon SageMaker, in a nutshell

Using Studio Notebooks for Data Science

Natural Language Processing with Amazon Comprehend

Comprehend Capabilities

Using the SDK

Time Series Forecasting with Amazon Forecast

Forecast Capabilities

Final Remarks

Resources:

Learn AWS with our PlayCloud Hands-On Labs

FREE AWS Exam Readiness Digital Courses

Tutorials Dojo Exam Study Guide eBooks

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Recent Posts

Written by: Lesmon Andres Lenin Saluta

Our Community

What our students say about us?

Personal ML Projects with Amazon SageMaker, Amazon Comprehend, Amazon Forecast and Other ML Services

Personal ML Projects with Amazon SageMaker, Amazon Comprehend, Amazon Forecast and Other ML Services

Dev Environments, Empowered with Amazon SageMaker

Amazon SageMaker, in a nutshell

Using Studio Notebooks for Data Science

Natural Language Processing with Amazon Comprehend

Comprehend Capabilities

Using the SDK

Time Series Forecasting with Amazon Forecast

Forecast Capabilities

Final Remarks

Resources:

Learn AWS with our PlayCloud Hands-On Labs

FREE AWS Exam Readiness Digital Courses

Tutorials Dojo Exam Study Guide eBooks

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Recent Posts

Written by: Lesmon Andres Lenin Saluta

Our Community

What our students say about us?

Did you find our content helpful?