Spaces:

aswnrj
/

multimodal-ai-search-engine

Sleeping

App Files Files Community

multimodal-ai-search-engine / README.md

aswnrj's picture

Update README.md

2dbd359 verified 28 days ago

|

history blame contribute delete

1.93 kB

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

metadata

title: Multimodal AI Search Engine
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: mit

🔍 Multimodal AI Search Engine

A sophisticated image search engine that enables both text-to-image and image-to-image similarity search using state-of-the-art deep learning models.

🌟 Features

🔤 Text-to-Image Search: Find images using natural language descriptions
🖼️ Image-to-Image Search: Upload an image to find visually similar ones
⚡ Fast Search: Sub-second query response times using FAISS indexing
🎯 High Accuracy: Powered by OpenAI's CLIP-ViT-B-32 model
🎨 Modern UI: Clean, responsive Gradio interface

🚀 How It Works

First Visit: The app automatically downloads 500 images from Caltech101 dataset
Embedding Generation: Creates CLIP embeddings for all images using ViT-B-32 model
Index Building: Builds FAISS index for fast similarity search
Ready to Search: Use text descriptions or upload images to find similar content

🔧 Technology Stack

CLIP-ViT-B-32: OpenAI's vision-language model
FAISS: Facebook's similarity search library
Gradio: Interactive web interface
Caltech101: 500 diverse images across 101 categories

📊 Dataset

Source: Caltech101 via HuggingFace
Size: 500 randomly sampled images
Categories: 101 different object classes
Auto-Setup: Downloads and processes on first run

💡 Usage Tips

Text Search: Use descriptive phrases like "red car on road" or "cat sitting"
Image Search: Upload any image to find visually similar ones
Results: Adjust the number of results using the slider (1-20)
First Load: May take 5-10 minutes to set up dataset initially

Note: First-time setup may take several minutes as the app downloads and processes the image dataset.