Because of the ubiquity and accessibility of digital cameras – and I’ll bet you’re holding one in your hand right now, baked into your mobile phone – we as a generation take more pictures than ever before in the history of the human race. According to PhoTutorial, it is estimated that humanity nowadays takes 3.7 million photos per minute1. That translates to an impressive 162 billion images monthly2, if your mind has not been boggled enough.
The convenience of capturing digital images introduced a modern problem for us. We now have to wade through heaps and tons of images to search for photos we took a year ago. Prior solutions, such as categorizing via folders or manually tagging each photo with custom labels, are proving too time-consuming and tedious, especially as we generate more and more pictures at an ever-increasing rate. We don’t foresee this rate slowing down anytime soon.
Luckily, Artificial Intelligence has advanced to the point where it can significantly help us search images. Thanks to technologies and infrastructure such as Amazon Bedrock, these advances in AI are available for us, normal developers, and not just for generously funded government research facilities. We can use AI to label the images based on the objects they contain, color patterns, and other vital metadata. We can also use AI tools to help us filter which photos are “similar” to what we are looking for.
Let’s Build an Image Search Engine
In this guide, I will show you how to create an image search engine that can look through a massive pile of images without us, tired and weary human beings, having to add labels to it in any way manually. All we need to do is shoot and upload the pictures, and our system will take care of the labeling, comparing, and filtering.
Our image search engine must satisfy the following requirements:
- The image search engine must accept file uploads from users.
- Given a text search input from the user, return relevant uploaded images based on the query.
- Do not incur costs if not being used. We want to keep our costs as low as possible and do not wish to incur monthly bills for something we are not actively using.
- Even with the previous constraint, we want our image search engine ready immediately when needed. We do not want to wait (and have no patience) for instances to boot and start.
See the video below for a better idea of the image search we want to build.
Our demo is live at https://image-ai.demo.klaudsol.com/; feel free to play and tinker with it.
If you prefer to spoil yourself and dive directly into the source code, you may proceed to https://github.com/ardeearam/tutorial-intelligent-image-search-bedrock.
Prerequisite Knowledge
Before proceeding further, I assume a basic to intermediate understanding of the following topics from the reader:
- AWS Cloud Development Kit (CDK)
- Python programming language
- Django web framework
- Django deployment to AWS Lambda via Zappa
You should first brush up on these topics to get the maximum benefit from this guide. Below are some recommended readings if any of these topics are new to you:
- https://aws.amazon.com/cdk/
- https://www.codecademy.com/learn/python-for-programmers
- https://docs.djangoproject.com/en/5.2/intro/tutorial01/
- https://github.com/zappa/Zappa
With all that out of the way, let’s get started!
Architecture Brief
We used AWS CDK to instantiate all of our backend-related resources. We do this so that all of our infrastructure is represented as Python code (aka “Infrastructure as Code” or “IaC”) that we can commit to the repository, share, reason out, and further improve on.
For the frontend, we deploy a Django application using Zappa. This open-source tool automatically bundles our application and deploys it on AWS Lambda, Amazon API Gateway, and Amazon CloudFront. Zappa is specifically built to deploy Python-based applications and is Python-aware. We could have done our frontend in CDK, but that would be reinventing the wheel. Zappa is a great shortcut that makes our lives easier.
For the rest of the guide, we will refer to the source code, which can be found here: https://github.com/ardeearam/tutorial-intelligent-image-search-bedrock.
We can divide our users’ journey into two phases: upload and search.
Upload Architecture
S3 Bucket
Our user journey starts with our S3 Bucket receiving all the files to be processed. We create our S3 bucket in CDK as such:
# intelligent_image_search/intelligent_image_search_stack.py bucket = s3.Bucket( self, "IntelligentImageSearchBucket", versioned=True, bucket_name=env_vars['S3_BUCKET_NAME'], cors=[ s3.CorsRule( allowed_methods=[ s3.HttpMethods.POST, ], allowed_origins=["*"], allowed_headers=["*"], exposed_headers=["ETag"] ) ], block_public_access=s3.BlockPublicAccess.BLOCK_ALL ) bucket.add_event_notification( s3.EventType.OBJECT_CREATED, s3n.LambdaDestination(my_lambda) ) bucket.grant_read(my_lambda)
Here are some interesting bits about this bucket:
- We need a Cross-Origin Resource Sharing (CORS) rule allowing POST from our frontend. We will directly upload our files to this bucket, and the files won’t pass through our Lambda-hosted Django application anymore to simplify the architecture and bypass Lambda limitations such as processing time limit and limits on file size upload.
- We block all public access to the S3 for increased security and solely rely on IAM permissions for access control.
- Whenever an object is created on this S3 (i.e.,
s3.EventType.OBJECT_CREATED
), we trigger a Lambda function that processes the incoming image. We will discuss this Lambda Event Handler in detail in a bit. - We grant the Lambda Event Handler read access to this S3 Bucket so that we can read the image content for processing.
Lambda Event Handler
Most of the magic happens in the Lambda Event Handler, our powerhouse listening to uploads on our S3 bucket. We create our Lambda function like so:
# intelligent-image-search/intelligent_image_search_stack.py my_lambda = _lambda.Function( self, "OnUploadLambdaFunction", runtime=_lambda.Runtime.PYTHON_3_11, handler="handler.main", code=_lambda.Code.from_asset("lambda"), timeout=Duration.seconds(900), environment=env_vars ) my_lambda.role.add_managed_policy( iam.ManagedPolicy.from_aws_managed_policy_name("AmazonBedrockFullAccess") )
Notice the following properties of our Lambda function:
- The code of this function is found in
lambda/handler.py
, with the functionmain
. We dedicate the Upload Implementation section of this guide to discussing the critical parts of this code. - We max out our Lambda timeout to 15 minutes or 900 seconds. By default, CDK creates Lambda functions with a 3-second timeout, which is insufficient time for us to process the image. Most of the time goes to waiting for Amazon Bedrock for the image description and the embeddings; each call would require at times greater than 3 seconds.
- We grant this Lambda function full access to Amazon Bedrock to retrieve the needed inferences and embeddings. More on this in the
describe_image()
andget_embeddings()
sections. - We use environment variables using the
dotenv
Python package instead of hard-coding setup-specific parameters, and pass them as the environment parameter during Lambda creation. Using environment variables this way is best practice from a security and maintainability standpoint. We do not want to commit sensitive information such as API keys and passwords to our public GitHub repositories and unnecessarily share them with the world. We also want our code to be configurable so that other people can point to their resources without the need to change the code base.
Returning to our requirements constraint list, we only get charged for the runtime in seconds of our Lambda function. If the Lambda function is not invoked because nobody is using our application, we do the combination of S3 plus the Lambda Event Handler works perfectly for our cost constraints.
Amazon Bedrock
Amazon Bedrock is a serverless solution and, as such, does not need provisioning as you would an EC2 instance. It is readily available via API calls, and charging occurs on a per-invocation basis. However, only Amazon-provided models, such as Titan Text and Titan Embeddings, are enabled by default. This guide will use Claude Sonnet for our multimodal inference and Cohere Embed for our embeddings. They are NOT activated by default; we must do so to invoke them. Our choice of these two models is based on hours of experimentation, and they simply generate the best responses for the task at hand.
You may check out this guide to ensure that these two models are available for our usage: https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html.
Serverless Django via Zappa
We chose Django as our frontend for two reasons: 1) it is a popular and familiar Python-based web framework that many developers already know and love, and 2) the existence of a full serverless deployment via Zappa. Reason number two is directly tied to our cost constraints. We do not want to pay for an EC2 instance on a rarely visited website.
By default, Zappa creates an execution role that will be used to run our Django frontend. However, in this guide, we need to add more permissions to the execution role (most importantly, access to Amazon Bedrock for searching). Thus, we opted to override the default behavior and create the execution role in CDK to be used by Zappa. Using CDK in this scenario allows us to document all our changes to the Zappa execution role.
# intelligent-image-search/intelligent_image_search_stack.py # Execution Role for our Zappa frontend zappa_frontend_execution_role = iam.Role( self, "ZappaFrontendExecutionRole", role_name="ZappaFrontendExecutionRole", assumed_by=iam.ServicePrincipal("lambda.amazonaws.com") ) # Attach AmazonBedrockFullAccess zappa_frontend_execution_role.add_managed_policy( iam.ManagedPolicy.from_aws_managed_policy_name("AmazonBedrockFullAccess") ) bucket.grant_read_write(zappa_frontend_execution_role) # Zappa boilerplate permissions to ensure proper deployment and execution zappa_boilerplate_permissions_policy = iam.Policy( #...
# django/zappa_settings.json # Note: you won't see this in the repository, as this is setup-specific. # Zappa creates this file for you at initialization time. # This is part of .gitignore { "production": { # ... "s3_bucket": "...", "role_name": "ZappaFrontendExecutionRole" # ADD THIS }
The following things are the highlights that we need to focus on:
- Our Zappa execution role,
ZappaFrontendExecutionRole
, needs access to Amazon Bedrock, similar to our Lambda Event Handler. Access is not necessary for the upload phase, as our Django frontend has minimal participation, but as we shall see later, it is critical for the search phase. - We also give our Django frontend read and write access to our S3 bucket. The read access will be used for the frontend to display our images via presigned URLs. The write access is strictly not necessary for this guide, but may be used in future developments such as the ability to delete images via the Django frontend.
Once ZappaFrontendExecutionRole
has been created, we will use this in our django/zappa_settings.json
as our role name. You will not find django/zappa_settings.json
in our repository, as it contains setup-specific information. Zappa will create this file upon initialization, and you need to add the role_name parameter to point to the execution role that CDK created.
Pinecone Vector Database
In our architecture diagram, we see the need for a vector database. What is a vector database, why do we need one, and what are its strengths and limitations?
A vector database is a specialized system for storing, indexing, and searching vector embeddings. Vector embeddings are numerical representations of data (in our case, text) that enable semantic similarity search. Unlike keyword-based methods, this search relies on meaning.
Although AWS has Amazon OpenSearch Service, which offers a “serverless” vector database feature, it charges a minimum of 2 OpenSearch Compute Units (OCU) monthly, or ~$3403, which is not insignificant. This minimum monthly charge violates our cost constraint directly, so we venture outside the AWS ecosystem and choose Pinecone. This vector database is free if you stay below 2GB of storage4 – plenty of space for our demo project.
Neon Serverless PostgreSQL
While Pinecone excels at semantic similarity search, it’s inefficient for retrieving all uploaded images. Listing entries is an O(n) operation for most vector databases, incurring a read cost per vector. A traditional relational database remains the more solid choice for non-semantic use cases.
We first consider relational databases well within the AWS ecosystem. Until recently (November 2024 to be exact5), Amazon’s serverless relational database offering, Amazon Aurora Serverless V2, charges a minimum of 0.5 Aurora Capacity Unit (ACU), even when not in active usage6. Similar to OpenSearch, this violates our cost requirements.
This limitation pushed us to look for an alternative such as Neon. Neon offers access to PostgreSQL for free as long as you keep your storage below 0.5GB, and your compute hours to 1907.
Upload Implementation
Every time an image or a set of photos is uploaded to our S3 bucket, the main function of the file lambda/handler.py is invoked. It’s time to dive into the code implementation details and see what is happening behind the scenes.
# lambda/handler.py def main(event, context): s3 = boto3.client("s3") bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") pc = Pinecone(api_key=os.environ['PINECONE_KEY']) vector_db = pc.Index(os.environ['PINECONE_INDEX']) for record in event["Records"]: bucket = record["s3"]["bucket"]["name"] key = record["s3"]["object"]["key"] image_descriptions = describe_image(s3=s3, bedrock=bedrock, bucket=bucket, key=key) insert_to_db( s3_file_path=s3_file_path(bucket, key), description= "\n".join(image_descriptions) ) image_description_embeddings = get_embeddings(bedrock=bedrock, texts=image_descriptions, input_type="search_document") upsert_embeddings( vector_db=vector_db, s3_file_path=s3_file_path(bucket, key), embeddings=image_description_embeddings, texts=image_descriptions )
Our lambda handler can process one or more images every upload. In cases of multiple images uploaded simultaneously, we process each image one at a time, in a loop.
For every image, we do the following operations:
describe_image()
– We generate a text description given the uploaded image.insert_to_db()
– Insert the image path and generated description into our relational database.get_embeddings()
– Generate the corresponding vector embedding given the text description.upsert_embeddings()
– We store the generated vector embedding in our vector database for similarity search later.
describe_image()
This method’s raison d’être is to send a base64-encoded image file to Amazon Bedrock to receive a detailed text description of the sent image. This method is our primary step for us to have an “understanding” of the image’s content, and for us to have a text representation of the image that we can later process and compare.
#lambda/handler.py payload = json.dumps({ "anthropic_version": "bedrock-2023-05-31", "system": """ You are part of a system that does intelligent image search. Intelligent image search is a system that allows you to search for images, based on items in the image, without the need for users explicitly tagging the images. Your role in this subsystem is to describe in detail the incoming image, and all the items in the image. """, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Please describe this image in detail. Pay close attention to the items and the colors." }, { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_base64 } } ] } ] }) response = bedrock.invoke_model( modelId='us.anthropic.claude-3-5-sonnet-20241022-v2:0', contentType='application/json', accept='application/json', body=payload )
Here are some things that we need to focus on in this method:
- As previously mentioned, the model we use is Anthropic Claude 3.5 Sonnet.
- We gave Sonnet a system prompt on the context or the bigger picture. This system prompt gives Sonnet additional information on how it will generate its response, including its style of response, and from what perspective it is providing its response.
- Our primary prompt asks for a description of the uploaded image, focusing on colors. You have plenty of degrees of freedom to modify this prompt depending on the nature of the problem being solved. For example, we can state that we want more focus on the clothes if we are solving problems in the fashion domain, or more on animals if we expect mostly animal pictures.
insert_to_db()
This method is straightforward – it saves the S3 file path of the incoming image and the description generated by Bedrock into our relational database. Our Django frontend will query this database later for image display and search.
#lambda/handler.py def insert_to_db(s3_file_path, description): DATABASE_USER=os.environ['DATABASE_USER'] DATABASE_PASSWORD=os.environ['DATABASE_PASSWORD'] DATABASE_HOST=os.environ['DATABASE_HOST'] DATABASE_NAME=os.environ['DATABASE_NAME'] conn = pg8000.connect( user=DATABASE_USER, password=DATABASE_PASSWORD, host=DATABASE_HOST, database=DATABASE_NAME ) cursor = conn.cursor() data = (s3_file_path, description) cursor.execute( """ INSERT INTO app_image(s3_file_path, description) VALUES(%s, %s) ON CONFLICT(s3_file_path) DO UPDATE SET description = excluded.description """, data ) conn.commit() cursor.close() conn.close()
A minor note, though: the most common Postgres driver for Python is psycopg2
. We are using pg8000
in this guide because I cannot, for the love of heaven, make psycopg2
to work through CDK. I highly suspect that it’s because Lambda requires psycopg2-bin
(not psycopg2
), and there is a binary mismatch between the Lambda bundle that I fetched and generated using my Mac versus the one needed by Amazon Linux at Lambda runtime. The pg8000
module worked perfectly because it has no binary dependencies.
get_embeddings()
In a nutshell, this method generates the corresponding vector embedding of an input text.
A vector embedding is a list of values, commonly of floating-point types. A possible example of a vector embedding could be: [0.23, -0.17, ..., 0.94]
. Each number in an embedding doesn’t mean much by itself. But when you combine all the numbers, they act like coordinates that place an idea, like a word or sentence, into a “meaning space.” In that space, things with similar meanings end up close to each other.
#lambda/handler.py def get_embeddings(bedrock, texts, input_type): MODEL_ID = "cohere.embed-english-v3" payload = { "texts": [texts] if isinstance(texts, str) else texts, "input_type": input_type, "embedding_types": ["float"] } body = json.dumps(payload) response = bedrock.invoke_model( modelId=MODEL_ID, body=body, contentType="application/json", accept="application/json" ) response_body = json.loads(response['body'].read()) return response_body
Notice the following:
-
As mentioned, we use the Cohere Embed model in Amazon Bedrock to generate vector embeddings.
-
The standard embedding type is
float
. Other available embedding types, such asint8
,uint8
,binary
, andubinary
, are particularly beneficial for large-scale applications where storage efficiency and retrieval speed are critical. If storage size and scale are not issues for our application, we can safely stick withfloat
. -
The input type for our embedding is
search_document
. This type is used when the embeddings are intended for storage. Later during the search phase, we will use thesearch_query
input type to generate embeddings from user queries.
upsert_embeddings()
As the name suggests, this method “upserts” a vector embedding generated by get_embeddings()
. For those unfamiliar with the term “upsert”, it’s inserting an entry when it doesn’t exist, but updating an entry when it already does to prevent duplicate entries from happening.
#lambda/handler.py def upsert_embeddings(vector_db, s3_file_path, embeddings, texts): vectors = [( f"{s3_file_path}-{index}", embedding, { 'text': texts[index], 's3_file_path': s3_file_path } ) for index, embedding in enumerate(embeddings['embeddings']['float'])] vector_db.upsert(vectors)
The upsert method receives an array of tuples as an input parameter. For each tuple, we have the following:
- The first element is the vector index that needs to be unique for all vector embeddings in our system. We rely on the uniqueness of an S3 file path as the identity of any image in our system.
- The second element is the actual embedding (i.e., a list of floating-point numbers).
- The third element is the vector metadata. We store the S3 file path with the vector so that we can refer back to the image path of the resulting vectors of our search similarity operation. We also store the text for debugging purposes. We don’t use the text metadata other than giving us humans an idea of what the vector embedding is all about.
Upload processing ends when we successfully store vector embeddings of the uploaded images for future comparison. You can view the resulting embeddings in the Pinecone dashboard to double-check if the operation succeeds.
Clicking on any of the embeddings will reveal its actual numeric values and associated metadata.
Search Architecture
Once we have a database of vector embeddings of all the uploaded images, we can accept queries, convert the user query into a vector embedding, compare the query vector embedding with the most semantically similar vectors, and then look up which images correspond to those vectors in our relational database.
Search Implementation
The algorithm for our search would be as follows:
get_embeddings()
– Given the user’s search query from our Django frontend search form, we convert that query into a vector embedding to allow us to do semantic similarity search.search_in_vector_db()
– Given the corresponding query vector embedding, look for the most similar vector embeddings from our vector database.Image.objects.filter()
– Given the indices of the most similar vector embeddings, query for the corresponding images’ necessary metadata for proper presentation to the user.
We implement our algorithm in our Django view as such:
#django/app/views.py q = request.POST.get('q') bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") pc = Pinecone(api_key=settings.PINECONE_KEY) vector_db = pc.Index(settings.PINECONE_INDEX) query_embedding = get_embeddings(bedrock=bedrock, texts=q, input_type="search_query")['embeddings']['float'][0] matching_images = search_in_vector_db(vector_db=vector_db, query_embedding=query_embedding) images = list(Image.objects.filter(s3_file_path__in=matching_images)) presigned_urls = {image.s3_file_path: generate_presigned_url(image.s3_file_path) for image in images} return render(request, 'app/index.html', {'images': images, 'presigned_urls': presigned_urls})
get_embeddings()
Our get_embeddings()
implementation is precisely the same as our upload phase get_embeddings()
method. The significant difference worth noting here is the input_type of search_query
. We must get the input type right to obtain the correct vector embedding suitable for semantic similarity search.
search_in_vector_db()
In this guide, we are working on vectors as lists with 1024 items (or “dimensions”), as per the embeddings generated by Cohere Embed. To have a better understanding of how vector similarity search works, let’s flatten our 1024-dimensional vector into two dimensions, as shown in the image:
As you can visually confirm, the vector representations of “apartment” and “condominium” have similar magnitude and have a slight angle between them. Compare this with “Lord Voldemort”, which has a larger angle between either “apartment” or “condominium”. The tiny difference in angle and magnitude makes sense as an apartment and a condominium are structures where humans can live. The larger angle difference between “apartment” and “Lord Voldemort” is something that we can accept intuitively.
Pinecone handles all the technicalities of vector comparison under the hood and provides us developers with a simple interface with which we can easily work.
#django/app/views.py def search_in_vector_db(vector_db, query_embedding): results = vector_db.query(vector=query_embedding, top_k=3, include_metadata=True) match_dict = [{ 's3_file_path': match['metadata']['s3_file_path'], 'score': match['score'] } for match in results['matches']] df = pd.DataFrame(match_dict) df.sort_values('score',ascending=False, inplace=True) df.drop_duplicates(subset=['s3_file_path'], inplace=True) df_subset = df[['s3_file_path']] return df['s3_file_path'].to_list()
We can notice the following, looking at the source code above:
- The Pinecone query method has a
top_k
parameter set to “3”. This parameter indicates that we want to retrieve at most three vector embeddings closest in similarity to our query vector embedding. The ideal value of this parameter is hugely dependent on the problem domain, and thus, you have an enormous leeway in experimenting with what value works for you here. - We used the
pandas
library to pre-process the retrieved vector embeddings. Note that Pinecone stores metadata with the vector embeddings so that we can recover, manipulate, and filter with them. In this case, we sorted the vector embeddings using the similarity score set by Pinecone and dropped any duplicates (probably resulting from uploading the same image file multiple times).
We return the S3 file path metadata we previously stored during the upsert_embeddings()
operation.
Image.objects.filter()
The final step in our search phase is to query the relevant data stored in our relational database based on the S3 file path returned by the search_in_vector_db()
operation. This step is straightforward, as we will use built-in Django model methods for the query.
Here’s a recap of the relational database query code:
# django/app/views.py images = list(Image.objects.filter(s3_file_path__in=matching_images)) presigned_urls = {image.s3_file_path: generate_presigned_url(image.s3_file_path) for image in images} return render(request, 'app/index.html', {'images': images, 'presigned_urls': presigned_urls})
Raw S3 paths can’t be used in the Django frontend, and making the bucket public poses huge security risks. Instead, we generate presigned URLs that grant temporary access to each image.
Our journey ends with delivering all relevant images based on the search query (e.g., “super cute cat”). We can do this without allocating time to labels, tags, or any categorizations on our images.
Try it out for yourself and visit the demo image search engine here: https://image-ai.demo.klaudsol.com
Conclusion
Artificial intelligence has advanced to the point where we can solve problems that seemed unsolvable a decade ago. Not only that, but the advancements have been made available not only to highly funded research facilities, but also to ordinary individuals who wanted to solve their particular niche problems, thanks to cloud-based AI technologies such as Amazon Bedrock and Pinecone.
This guide was intended to be as short as possible while remaining technically accurate. Surprisingly, it still turned out longer than expected. Only the most essential concepts are included, while secondary topics were intentionally left out for readers to explore on their own. For implementation details not discussed here, refer to the source code, the definitive reference: https://github.com/ardeearam/tutorial-intelligent-image-search-bedrock
Happy coding!
References
1,2https://photutorial.com/photos-statistics/#per-year
3https://aws.amazon.com/opensearch-service/pricing/#Amazon_OpenSearch_Serverless
4https://www.pinecone.io/pricing/#plan-comparison-table
6https://neon.tech/blog/aurora-serverless-v2-scales-to-zero-now-what
8Credits to Edgar on Unsplash for the photo of the cat.
9Photo of the tiger by Saumil Joshi on Unsplash
10Photo of the robot by Owen Beard on Unsplash