Image Similarity Search System

The aim of this project is to develop a image search engine system that can search similar images and host them.

Problem Statement

Designing an image search engine which takes input an image and can search and host similar images.

Data

The Caltech101 dataset contains images from 101 object categories (e.g., “helicopter”, “elephant” and “chair” etc.) and a background category that contains the images not from the 101 object categories. For each object category, there are about 40 to 800 images, while most classes have about 50 images. The resolution of the image is roughly about 300×200 pixels. Provisions have been made to collect new labels and data through APIs.

Architecture

Data Store Component

Creation of AWS S3 folder to store image data for setting up hosting S3 link.

Creation of Mongo DB Collection to store Meta Data and Labels Information.

Creation of API links for image data collectors annotators to upload images and labels.

Creation of Docker File for the Pipeline.

Deploying the pipeline using AWS and GitHub Actions.

Model Trainer Component

Ingestion of Data from S3.

Preprocessing the images for Model Training.

Employing Pre-trained Resnet34 architecture to generate embeddings.

Using Approximate Nearest Neighbors to build the embedding tree.

Upload Model and Artifacts to S3.

Creation of Docker File for the Pipeline.

Deploying the pipeline using AWS and GitHub Actions.

Model Prediction Component

Download Model and Artifacts.

Taking input image from user through API.

Preprocessing the input.

Generating Embedding for the input using Previously built Model Architecture.

Employing Approximate Nearest Neighbors for similarity searching.

Get the publicly hosted S3 links of similar images.

Provide the links to the users.

Creation of Docker File for the Pipeline.

Deploying the pipeline using AWS and GitHub Actions.

Problem Statement

Data

Architecture

Data Store Component

Model Trainer Component

Model Prediction Component

All code was written using Python and the mentioned technologies / frameworks following OOPs Principles and guidelines.