Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.44.1
metadata
title: Multimodal AI Search Engine
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: mit
π Multimodal AI Search Engine
A sophisticated image search engine that enables both text-to-image and image-to-image similarity search using state-of-the-art deep learning models.
π Features
- π€ Text-to-Image Search: Find images using natural language descriptions
- πΌοΈ Image-to-Image Search: Upload an image to find visually similar ones
- β‘ Fast Search: Sub-second query response times using FAISS indexing
- π― High Accuracy: Powered by OpenAI's CLIP-ViT-B-32 model
- π¨ Modern UI: Clean, responsive Gradio interface
π How It Works
- First Visit: The app automatically downloads 500 images from Caltech101 dataset
- Embedding Generation: Creates CLIP embeddings for all images using ViT-B-32 model
- Index Building: Builds FAISS index for fast similarity search
- Ready to Search: Use text descriptions or upload images to find similar content
π§ Technology Stack
- CLIP-ViT-B-32: OpenAI's vision-language model
- FAISS: Facebook's similarity search library
- Gradio: Interactive web interface
- Caltech101: 500 diverse images across 101 categories
π Dataset
- Source: Caltech101 via HuggingFace
- Size: 500 randomly sampled images
- Categories: 101 different object classes
- Auto-Setup: Downloads and processes on first run
π‘ Usage Tips
- Text Search: Use descriptive phrases like "red car on road" or "cat sitting"
- Image Search: Upload any image to find visually similar ones
- Results: Adjust the number of results using the slider (1-20)
- First Load: May take 5-10 minutes to set up dataset initially
Note: First-time setup may take several minutes as the app downloads and processes the image dataset.