Last updated on November 30, 2023
Machine learning and artificial intelligence have been powering many of the technologies we use daily, some of which we may not actively pay attention to, and they have become second nature to us. Suppose we actively look for the presence of ML/AI. In that case, we can find them everywhere: natural language processing in our AI Assistants, recommender engines in e-commerce, social media, and music, and fraud detection in finance, among many other technologies. Although these very powerful models are the ones running the digital world we live in, we can replicate the functionalities of said models for our uses, such as personal projects, with industry-level performance. To drive the functionalities even further, we can utilize the suite of services offered by Amazon Web Services. With AWS, the capabilities of ML and AI are elevated to new heights, enabling us to tackle complex tasks easily. Personal machine learning projects generally follow a standard flow encompassing the majority of considerations, decisions, and steps to be made during the project’s lifetime. Below is just one of these flows centered around the ethical, technical, and mathematical concepts of machine learning. Machine learning projects usually start within the local developer environment, running on the user’s local device and utilizing the hardware capabilities and operating system it has been made to run on. However, this approach may introduce a bottleneck in hardware computing capabilities that can affect several pipeline stages. Given an extensive dataset with enough hyperparameters to be tested, training times could easily reach hours, drastically limiting efficiency and productivity. One of the fastest and most effective ways to circumvent this is to use the fully managed machine learning service from AWS, Amazon SageMaker. Amazon SageMaker is a fully managed service where users can build, train, and deploy machine learning models quickly, with tools that can aid almost every step of the process. Some of the most notable features of SageMaker for our contexts include the following: Part of the SageMaker Studio is Studio Notebooks, an easy-to-launch and collaborative version of the traditional notebooks used in machine learning, equipped with persistent storage, an Amazon EC2 instance type (to provide the compute power), and a SageMaker image for containerization and to prepare ready-to-use environments. To access studio notebooks, navigate to: Amazon Sagemaker > Domains (create one if there are none yet) > Select your domain > Launch > Studio Once the interface loads, we will be redirected to the dashboard for Amazon SageMaker Studio. Amazon SageMaker Studio has countless other features that can be accessed here, way beyond the scope of this article. Still, for a starting point with a sense of familiarity, we can: 1. Boot up the launcher. 2. Create a notebook. 3. Work on any data science project with the style of a notebook and with a configurable EC2 instance that the user can change in the top-right configuration area. These examples are only the surface of the capabilities of Studio Notebooks, but in essence, SageMaker is a highly flexible and innovative approach to machine learning. We will discuss the other features of Amazon SageMaker further in future articles. Apart from versatile and multi-use services like SageMaker, Amazon also provides services that cater to specific use cases by providing ultra-powerful, high-level pre-trained models. One of these services is Amazon Comprehend, which uses NLP to extract valuable insights from documents. Common use cases of Amazon Comprehend include sentiment analysis, entity recognition, and language detection. In addition to sentiment analysis, entity recognition, and language detection, Amazon Comprehend also offers document categorization and keyphrase extraction, making it a versatile tool for various natural language processing tasks. Amazon Comprehend can be accessed programmatically using the AWS SDK, available in multiple programming languages, but in our case, we will be using Python (boto3). By initiating a client, we can now use Comprehend as if we were using it in the console. We can now call methods upon `comprehend` to call the services. For instance, to detect sentiments, we can call `detect_sentiment` and pass in the text we want to analyze, as well as some parameters. This then returns a JSON containing the sentiment scores of the text, as well as some metadata. Building up from this simple test, we can develop purpose-built projects or functionalities that can help generate useful insight or drive decisions. An example project would be a product review study system where product reviews can be analyzed to identify the main selling points and address the pain points of users of a product. An example iteration of this can be attained with the following code: In essence, what this code snippet can do is extract the parts of the review that are direct comments on the parts of the product and provide the sentiment of the review onto the part. If we take the following example: We can get an output like the following: We can run this code on all the reviews, parse the results, and generate insights on whatever product or service we are analyzing. Another specialized service from Amazon is Amazon Forecast, a service that uses machine learning to generate precise time-series predictions or, as the name implies, forecasts. The idea behind it is that given a dataset containing time series data, we can observe what happens to the prediction variable over time, and once we recognize a pattern, we can then use that to infer future values. Forecast simplifies the process of (repeatedly) applying cutting-edge algorithms over multiple datasets to generate the most accurate predictions. Some of the common use cases of Amazon Forecast include inventory planning, and operational planning, among a few others. Upon choosing the initial settings for our dataset group, Forecast then allows us to upload our data, and include more information regarding our dataset. A valuable feature of Forecast is that it will enable multiple data sources for a prediction, such as metadata and other related time series, on top of the initial dataset, which allows for significantly improved predictions. This also includes AWS-managed data, like national holidays and weather, which could be rather complicated to have a dedicated source of, thus simplifying the process even further. Amazon Forecast automates much of the data preprocessing and feature engineering, crucial in time series forecasting. This can lead to more accurate models, as it efficiently handles complexities like missing values, outliers, and variable transformations. Another powerful feature of Forecast is AutoML, which can automatically select the best algorithm and tune its hyperparameters, simplifying the model development process by reducing the need for manual model selection and optimization. Forecast uses state-of-the-art models and is already significantly more sophisticated than traditional models; having this feature can allow for less difficulty in the training process. After the major improvements in the flow of data preparation and model training, Forecast also provides a simpler and easier extraction of predictions. The insights from the predictions can now be used for our use cases, which, in this case, can predict the demand for a store in the upcoming months. ML and AI are the driving forces of our world right now, and fortunately, the ability to harness these forces is well within our reach as individuals, professionals, and enthusiasts. There are endless ways to practice ML, but expanding towards cloud computing is becoming more and more popular – and after showing what we can do with the cloud, I would say that the popularity is well deserved! There are endless projects where the services and capabilities I mentioned can be integrated, from small-scale practice projects to company-level features shipped to clients. The list of things we do still grow as more and more features are rolled onto these services. Although the features I have shown are very powerful, it is also important to be cautious about using them, as it is a service, and it does entail a cost. In my experience, Forecast tends to be the most expensive due to the computing power it necessitates, but cautiousness should be exercised at ALL times. And lastly, I want to thank you for taking the time to read this article. Happy learning! https://aws.amazon.com/pm/sagemaker/Dev Environments, Empowered with Amazon SageMaker
Amazon SageMaker, in a nutshell
Using Studio Notebooks for Data Science
Natural Language Processing with Amazon Comprehend
Comprehend Capabilities
Using the SDK
import boto3
# Initialize the comprehend client
comprehend = boto3.client('comprehend')
# Provide the text to be analyzed
text = "I love using AWS for my projects!"
# Call the detect_sentiment method
response = comprehend.detect_sentiment(Text=text, LanguageCode='en')
{
'Sentiment': 'POSITIVE',
'SentimentScore':
{
'Positive': 0.994814932346344,
'Negative': 0.0002758224436547607,
'Neutral': 0.004847857169806957,
'Mixed': 6.13411029917188e-05
},
'ResponseMetadata':
{
'RequestId': '8878fd79-fb6d-4076-8f3f-7843dd0a55f7',
'HTTPStatusCode': 200,
'HTTPHeaders':
{
'x-amzn-requestid': '8878fd79-fb6d-4076-8f3f-7843dd0a55f7',
'content-type': 'application/x-amz-json-1.1',
'content-length': '163',
'date': 'Wed, 22 Nov 2023 09:05:22 GMT'
},
'RetryAttempts': 0
}
}
import boto3
import json
# Initialize the Comprehend client
comprehend = boto3.client(service_name='comprehend')
# Input the reviews as a list of sentences
reviews = …
# Loop for every review
for review in reviews:
# Detecting key phrases
key_phrases_response = comprehend.detect_key_phrases(Text=review, LanguageCode='en')
key_phrases = [phrase['Text'] for phrase in key_phrases_response['KeyPhrases']]
print(f"Review: {review}")
print("Key Phrases and their Sentiments:")
# Analyze sentiment for each key phrase
for phrase in key_phrases:
sentiment_response = comprehend.detect_sentiment(Text=phrase, LanguageCode='en')
sentiment = sentiment_response['Sentiment']
print(f" - {phrase}: {sentiment}")
print("\n")
reviews = [“Great battery life, but I don't like that we have a blurry screen.”]
Review: Great battery life, but I don't like that we have a blurry screen.
Key Phrases, Sentiments:
- Great battery life: POSITIVE
- a blurry screen: NEGATIVE
Time Series Forecasting with Amazon Forecast
Forecast Capabilities
Final Remarks
Resources: