Last updated on November 30, 2023
Welcome back to our series on model deployment in AWS! In the fast-paced world of machine learning and data science, the ability to deploy models efficiently and reliably is crucial. This is where AWS services, with their vast array of tools and capabilities, come into play. In this second installment, we will delve into the potent combination of AWS Lambda and Docker, coupled with the convenience of storing models in S3. This trio offers a scalable, cost-effective, and streamlined solution for deploying machine learning models in a production environment.
If you recall, in the first part of our series, we explored deploying a generative model for tabular data to an EC2 instance and Docker. It provided us with a robust foundation and a flexible environment to serve our models. But what if we could make it even more efficient? What if we could respond to data in real-time without provisioning or managing servers? This is where AWS Lambda shines, and integrating it with S3 elevates its capabilities further.
So, buckle up as we embark on this journey of understanding the relevance, intricacies, and advantages of deploying models with Lambda and Docker while leveraging the robustness of S3 for storage. Whether you’re a seasoned AWS user or just dipping your toes into the world of cloud-based model deployment, there’s something in here for everyone.
The Context and Motivation
In the ever-evolving realm of data science and machine learning, the use of synthetic data has emerged as a powerful tool. In the first part of our series, we created a model that generated synthetic tabular data, but why? Let’s delve into it.
Why Synthetic Tabular Data?
Generating synthetic tabular data serves multiple purposes. Firstly, it provides a safe harbor for experimentation without the risks associated with real-world data, particularly when dealing with sensitive information. This ensures that privacy concerns are addressed while still allowing robust testing and development. Secondly, synthetic data can help simulate various scenarios, enabling us to stress-test our models under diverse conditions. This becomes invaluable when real data is scarce, expensive, or limited in variability.
The Imperative of Monitoring
Once a model is deployed, the journey doesn’t end there. Models, no matter how meticulously crafted, can drift over time as the real-world data they encounter begins to change. This drift can lead to decreased performance, potentially resulting in inaccurate or biased predictions. That’s where monitoring steps in. Keeping a vigilant eye on our model’s performance allows us to catch these shifts early. This not only ensures the model remains accurate but also maintains the trust of those relying on its predictions. Moreover, with the inclusion of actual values in our synthetic data, we can emulate and monitor the model’s behavior, making the process even more insightful.
Lambda and S3: A Seamless Duo
So, where do AWS Lambda and S3 fit into all this? AWS Lambda allows us to execute our models without the hassle of server management, responding in real-time as new data enters our system. This serverless compute service can automatically run code in response to multiple events, like changes to data within an Amazon S3 bucket. Speaking of S3, it’s not just a storage solution. It’s a highly durable and available storage platform, ideal for hosting our models and ensuring they’re readily accessible when Lambda needs them. The synergy between Lambda’s event-driven architecture and S3’s reliable storage offers a seamless, efficient, and scalable solution to our model deployment and monitoring needs.
In essence, with the right tools and a clear understanding, we can transform the daunting task of model deployment and monitoring into a smooth and efficient process.
Architectural Overview
When venturing into the realm of machine learning deployment, especially within AWS, visualizing the architecture is pivotal. It provides a clear road map of how different components interact, ensuring a smooth flow of operations. So, let’s embark on a journey from the creation of synthetic data to its eventual prediction and storage.
The diagram sketches out the process:
- EC2 Instance (from Part 1): An EC2 instance, with a deployed model, generates synthetic tabular data.
- Amazon S3: This synthetic data is then stored in a designated S3 bucket.
- Event Notification: The moment new synthetic data enters the S3 bucket, it acts as a trigger, sending an event notification.
- Lambda Function: On receiving the notification, our AWS Lambda function springs into action. It fetches the synthetic data, processes it, and runs a prediction using our model.
- Results in S3: Post-prediction, Lambda saves the results, along with the predictions, back into another folder within the same S3 bucket.
Breaking Down the Components
- EC2 Instance: In our previous post (Part 1), we discussed the deployment of our machine learning model on an EC2 instance. This virtual server in AWS’s cloud is responsible for generating our synthetic tabular data, which emulates real-world scenarios without using actual sensitive data.
- Amazon S3: Amazon’s Simple Storage Service, or S3, is more than just storage. It’s a versatile tool that stores our synthetic data, hosts our machine-learning model, and saves prediction results. The durability and scalability of S3 ensure our data remains intact and our model accessible.
- Lambda Function: The star of this article, AWS Lambda, is a serverless computing service. Upon an S3 event trigger, Lambda effortlessly scales, processes the incoming data, leverages the model stored in S3 for predictions, and writes the results back. All this without the user ever managing a server!
By marrying the strengths of EC2, S3, and Lambda, we’ve crafted an architecture that not only efficiently handles model predictions but does so in a manner that’s scalable, cost-effective, and agile. This powerful trio ensures that as our synthetic data grows and evolves, our system adapts seamlessly, consistently delivering accurate predictions.
Lambda Function Breakdown
Deploying models using AWS services requires an in-depth understanding of various components. The centerpiece of this architecture is the AWS Lambda function. Let’s break down the vital steps in the Lambda function and the role they play.
Preprocessing
In the world of machine learning, preprocessing is a crucial step that determines the quality of predictions. In the context of our Lambda function:
- Binary Columns: Columns like “gender” or “PaperlessBilling” have binary values. Converting these values to 0s and 1s is essential for the model to interpret them. The function preprocess_binary_columns handles this task.
- Dummies Columns: For categorical columns, such as “Contract” or “PaymentMethod,” one-hot encoding is applied, turning them into ‘dummy’ columns. The function preprocess_dummies_columns is responsible for this.
- Numeric Columns: Numeric data, like “tenure” or “TotalCharges,” needs to be scaled to ensure uniformity and improve prediction accuracy. The preprocess_numeric_columns function standardizes these columns.
def preprocess(data: pd.DataFrame) -> pd.DataFrame: ... df = preprocess_binary_columns(df) ... df = preprocess_dummies_columns(df) ... df = preprocess_numeric_columns(df) ... return df
Model Retrieval
Machine learning models can be large and complex. Storing them in S3 makes them easily accessible, ensuring they remain consistent and unchanged. With the get_model function, our Lambda retrieves the pre-trained model from the specified S3 bucket using the pickle library, reading it into memory.
def get_model(): s3 = boto3.client('s3') model_obj = s3.get_object(Bucket="mlops-python", Key="models/gradient_boosting/gb_model.pkl") model_bytes = model_obj['Body'].read() model_stream = io.BytesIO(model_bytes) model = pickle.load(model_stream) return model
Prediction and Storage
Once the data is preprocessed and the model retrieved:
- Predictions: The processed data is fed into our machine learning model for prediction using the model.predict(df_preprocessed) function.
- Data Event: Rather than directly reading from S3, our function is triggered by an S3 event notification. This event occurs when new data (in our case, synthetic data) is placed into the S3 bucket, providing the data’s location (bucket and key).
- Storage: After predictions are made, they are appended to the original DataFrame and then written back to a different location in the S3 bucket using the s3.put_object method.
def predict(event, context): ... data = event['Records'][0]['s3'] ... s3.get_object(Bucket=bucket, Key=key) ... model = get_model() ... y_pred = model.predict(df_preprocessed) ... s3.put_object(Bucket=bucket, Key="predicted_data/" + key, Body=csv_buffer.getvalue()) ...
By leveraging the power of AWS Lambda, this architecture provides a streamlined process, right from the moment data enters S3, through preprocessing and predictions, up to the point where results are saved back. The serverless nature of Lambda ensures efficient scaling, making this approach both robust and cost-effective.
Lambda Deployment using Docker and ECR
Deploying serverless functions in the cloud presents a unique set of challenges, especially when dealing with machine learning models that might depend on specific libraries or environments. Docker, combined with Amazon Elastic Container Registry (ECR), provides a seamless solution.
Benefits of using Docker for deployment:
- Environment Isolation: Docker ensures that the environment your function runs in remains consistent, mitigating issues stemming from discrepancies between local and production environments.
- Library Management: Docker makes it easy to handle different library versions, ensuring your function has all dependencies at the right versions.
- Scalability: Dockerized functions can scale efficiently, with every function invocation using its container instance.
- Portability: Docker containers can run anywhere, which means you can move your function across AWS accounts or even cloud providers with minimal friction.
Before we proceed with the deployment steps, let’s check the contents of our Dockerfile:
FROM public.ecr.aws/lambda/python:3.11 # Copy requirements.txt COPY requirements.txt ${LAMBDA_TASK_ROOT} # Install the specified packages RUN pip install -r requirements.txt # Copy function code COPY lambda_function.py ${LAMBDA_TASK_ROOT} # Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile) CMD [ "lambda_function.predict" ]
Step-by-step Walkthrough:
Build the docker image:
docker build --platform linux/amd64 -t model-inference .
This command builds a Docker image from your Dockerfile. The tag (-t) is a human-readable name you assign to the image. Here, it’s named “model-inference”. Specifying the platform ensures compatibility with AWS Lambda’s architecture.
Login to ECR:
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin <account_id>.dkr.ecr.us-west-2.amazonaws.com
This command logs your Docker client into Amazon ECR. The login token is valid for 12 hours.
Create ECR repository:
aws ecr create-repository --repository-name model-inference --region us-west-2 --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE
If you haven’t set up a repository for your image, this command does it for you. It also ensures images are scanned for vulnerabilities on push and allows mutable image tags.
Get the repository URI:
aws ecr describe-repositories --repository-names model-inference --region us-west-2
Knowing the URI of your ECR repository is essential for tagging and pushing your Docker image. This command retrieves it.
Tag the image:
docker tag model-inference:latest <repository_uri>:latest
Here, you’re essentially labeling the Docker image with the repository’s URI so you can push it to that location.
Push the image to ECR:
docker push <repository_uri>:latest
This uploads your Docker image to ECR, making it accessible to AWS services, including Lambda.
Potential pitfalls:
- Size Limit: Ensure the Docker image is within the AWS Lambda’s size limits. Bloating can occur due to unnecessary files or libraries.
- Timeout: If the image is too large, Lambda may time out before it can even start the container.
- Permissions: Ensure your AWS account and Lambda function have the necessary permissions to pull images from ECR.
Deploying the Lambda Function:
Now that the Docker image is available in ECR, we can proceed to create the Lambda function:
- Navigate to the AWS Lambda Console.
2. Click on “Create Function”.
3. Choose the “Container image” as the deployment package.
4. Enter a suitable name for your Lambda function.
5. In the “Container image URI” section, provide the URI of the Docker image you pushed to ECR.
6. Configure the necessary execution role, VPC, and other settings as needed for your application.
In the end, it should look like this:
Event Notifications with S3:
With our Lambda function in place, the next crucial step is to ensure it gets triggered automatically whenever new synthetic data is pushed to our S3 bucket. This is achieved using S3’s Event Notifications.
1.Navigate to your S3 bucket in the AWS Management Console.
2. Under the “Properties” tab, scroll down to the “Event Notifications” section
3. Click on “Create event notification”.
4. Give your event a name, something descriptive like “TriggerLambdaOnDataUpload”.
5. Add the prefix of the folder where we will listen for events and specify the file type to ensure the Lambda function is triggered only for specific datasets or files within that designated directory.
6. In the “Event types” section, select “All object create events”. This ensures the Lambda function is invoked whenever new data is uploaded.
7. In the “Send to” dropdown, choose “Lambda function”.
8. For the Lambda function, choose the one you’ve just deployed.
9. Click on “Save changes”.
After setting this up, if we navigate back to our Lambda function, we should see something like this:
Remember, for the S3 bucket to trigger a Lambda function, the appropriate permissions need to be set. This often involves adding a new policy to your Lambda function that allows S3 to invoke it. Without this, you might run into permission errors.
After setting up the event notification, it’s a good practice to test the workflow. Upload a sample synthetic dataset to your S3 bucket or invoke the API that we deployed in our EC2 instance. If everything is set up correctly, the Lambda function should be invoked, and the processed data should appear in the designated output directory in S3.
Best Practices and Considerations
Building and deploying ML applications in the cloud require meticulous planning to ensure optimal performance, security, and scalability. Here’s a deep dive into some critical best practices and considerations:
1.Security: Ensuring Secure Access to S3 and ECR
a. IAM Policies: Use AWS Identity and Access Management (IAM) to control access to your resources. Grant least privilege access, meaning users and services only have permissions essential for their roles.
b. Encryption: Enable server-side encryption in S3. For ECR, ensure images are stored securely using AWS managed keys or customer-managed keys in AWS Key Management Service (KMS).
c. VPC Endpoints: Use Virtual Private Cloud (VPC) endpoints for S3 and ECR. This ensures that the traffic between your VPC and the services does not traverse the public internet, enhancing security.
d. Logging: Enable AWS CloudTrail to monitor API calls made on your S3 and ECR. It will help you keep an audit trail and respond quickly to any malicious activity.
2. Scalability: Handling Increasing Amounts of Data
a. Lambda Configuration: Adjust Lambda’s concurrency settings to handle multiple invocations simultaneously. It ensures your application scales with the data influx.
b. S3 Event Notification: Ensure S3 event notifications (like the ‘put’ event) efficiently trigger your Lambda functions without delays.
c. Batch Processing: If the data inflow increases, consider shifting from real-time processing to batch processing. You can accumulate data over a specific timeframe or size and then process it.
d. Docker Optimization: Regularly update your Docker containers to use optimized, lightweight base images. It will speed up the launch time, enhancing scalability.
3.Monitoring: Keeping Track of Model Predictions and Performance
a. Logging: Use AWS Lambda’s built-in logging capability to log predictions and other vital details. Using the provided Lambda function code, the actual values and predictions are stored together, allowing easy comparison.
b. CloudWatch Metrics: Use Amazon CloudWatch to monitor Lambda function metrics like invocation count, duration, error count, and concurrency. Setting up alarms for anomalous behavior can be beneficial.
c. Dashboarding: Create CloudWatch dashboards that give an at-a-glance view of your function’s performance, prediction outcomes, and the actual values in the synthetic data.
d. Feedback Loop: If possible, create a feedback loop where the prediction outcomes are compared with actual values. Discrepancies can be fed back into the training pipeline to continuously improve the model.
e. Versioning: Consider versioning your model in S3. If a newer model doesn’t perform as expected, it’s easier to roll back to a previous, better-performing version.
In summary, when deploying ML applications in the cloud, security, scalability, and monitoring are paramount. Regularly review and update configurations, be proactive in monitoring, and always prioritize security. This trio ensures optimal performance and a seamless user experience.
Wrapping Up
In our journey through the intricacies of setting up an ML application on AWS, we’ve touched on numerous facets, from synthetic data generation to the deployment of Lambda functions using Docker and ECR. Let’s distill our discussion into the primary takeaways:
- The Power of AWS: The seamless integration of AWS services like EC2, S3, Lambda, and ECR allows for a streamlined and robust machine learning pipeline. From data generation to inference, AWS offers a one-stop solution.
- Scalability & Flexibility: With AWS Lambda and Docker, we can effortlessly scale our applications to cater to varying data loads, ensuring both efficiency and cost-effectiveness.
- Security & Monitoring: The importance of maintaining a secure environment can’t be understated. AWS offers robust security features, from encryption to VPC endpoints, ensuring our data remains protected. Coupled with effective monitoring, we can ensure our application runs smoothly while maintaining a high standard of performance.
- Hands-on Deployment: The step-by-step guide underscores the practicality of deploying a machine learning model in real-world scenarios. It’s not just theoretical; it’s actionable.
Looking Ahead:
The possibilities with AWS and machine learning are vast. Potential future expansions could include:
- Integration with other AWS services, like SageMaker, to facilitate end-to-end machine learning workflows.
- Exploring the possibilities of multi-model deployments, where multiple models can be called based on the requirements.
- Establishing a continuous integration and deployment (CI/CD) pipeline for the model, ensuring updates are seamlessly integrated.
Lastly, the world of tech thrives on continuous evolution and feedback. If you decide to try out this setup or have already implemented a similar one, we’d love to hear from you. Your insights, challenges faced, or even a simple acknowledgment can provide immense value to the community. After all, innovation is often a collective endeavor. Happy coding!
Resources:
https://tutorialsdojo.com/deploying-a-trained-ctgan-model-on-an-ec2-instance-a-step-by-step-guide/
https://docs.aws.amazon.com/lambda/latest/dg/python-image.html
https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html