Ends in
00
days
00
hrs
00
mins
00
secs
ENROLL NOW

🎁 Get 20% Off - Christmas Big Sale on All Practice Exams, Video Courses, and eBooks!

Redacting PIIs Using S3 Object Lambda

The Challenge

Data privacy is a top priority for businesses, especially amid the rising global data regulations. One common challenge is ensuring sensitive data, like personally identifiable information (PII), remains protected when data is accessed or transferred.

Imagine you have a bunch of employee profiles stored as CSV files in an S3 bucket. These profiles include sensitive information such as real names, social security numbers, and email addresses, along with non-sensitive data like job titles and office locations. Various teams within your company often need to access these files for their work. For instance, HR may need to understand the distribution of roles, while the operations team may need the names and locations to plan logistics. However, PII, like social security numbers and personal email addresses, should remain confidential and not be accessible to these teams.

Before S3 Object Lambda, you would have to create and maintain two separate versions of your data – one with the sensitive information for authorized personnel and another redacted version for others. This approach not only doubles your storage requirements but also increases the risk of accidental data leakage or mishandling.

Enter S3 Object Lambda

With S3 Object Lambda, you can process data as it is being retrieved from S3 without altering the original stored data. You can modify the data returned by a standard S3 GET request, for example, to redact sensitive information, convert the data format, or compress the data on the fly.

How it works

S3 Object Lambda works by triggering your custom Lambda function upon a GET request to an S3 Object Lambda Access Point. The function receives details about the request, including a pre-signed URL for reading the original object. The function processes the data and writes it back to S3. The requester receives this processed data as a response to their GET request, while the original data in S3 remains intact.

Demo

Let’s see this in action. We’ll use Python in our Lambda function to redact the social_security_number and email columns from the following CSV file:

The transformed data will contain sensitive fields replaced with the word ‘REDACTED’:

Steps

  1. Creating the Lambda Functions
  2. Creating an S3 Bucket
  3. Tutorials dojo strip
  4. Setting up an S3 Access Point
  5. Creating an S3 Object Lambda Access Point

Creating the Lambda Functions

First, create two AWS Lambda functions.

  • Redact function: This function is responsible for redacting sensitive information from the original CSV file. Make sure to attach the AmazonS3ObjectLambdaExecutionRolePolicy to the function’s execution role. Set the timeout settings to 30 seconds to prevent the function from timing out. 

  • Reader function: We’ll use this function to simulate an end user or application retrieving the CSV file from the S3 bucket. Set the timeout settings to 30 seconds to prevent the function from timing out. 

Attach the policy below to the Reader function’s execution role. For demo purposes, we are granting IAM policy actions unrestricted access to all resources using a wildcard.

Creating an S3 Bucket

  1. Create a new S3 bucket or use an existing one.
  2. Upload the CSV file to your bucket.

Setting up an S3 Access Point

  1. Navigate to the Amazon S3 console and select your bucket.
  2. Move to the Access points tab and click on Create access point.
  3. Provide a name for your access point.
  4. Select Internet as the Network origin and leave other settings to their default values.
  5. Click Create access point.

Creating an S3 Object Lambda Access Point

  1. Go to the Object Lambda Access Points window and select the region where your bucket is located.
  2. Provide a name for your Object Lambda Access Point and choose your S3 bucket
  3. Select the Access Point that you created in Step 2.
  4. In the Transformation Configuration section, select GetObject from the S3 APIs and pick your Redact Lambda function.
  5. Leave the rest to their default settings and click Create Object Lambda Access Point.

Copy the ARN of your Object Lambda Access Point and update the value in the Reader function.

Testing

Now that everything is set up, you can test the system by running the Reader function. You should be able to see the redacted version of the original CSV.

Get 20% Off – Christmas Big Sale on All Practice Exams, Video Courses, and eBooks!

Tutorials Dojo portal

Learn AWS with our PlayCloud Hands-On Labs

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS Exam Readiness Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

Follow Us On Linkedin

Recent Posts

Written by: Carlo Acebedo

Carlo is a cloud engineer and a content creator at Tutorials Dojo. He's also a member of the AWS Community builder and holds 5 AWS Certifications. Carlo specializes in building and automating solutions in the Amazon Web Services Cloud.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?