--- title: Multimodal AI Search Engine emoji: 🔍 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.42.0 app_file: app.py pinned: false license: mit --- # 🔍 Multimodal AI Search Engine A sophisticated image search engine that enables both text-to-image and image-to-image similarity search using state-of-the-art deep learning models. ## 🌟 Features - **🔤 Text-to-Image Search**: Find images using natural language descriptions - **🖼️ Image-to-Image Search**: Upload an image to find visually similar ones - **⚡ Fast Search**: Sub-second query response times using FAISS indexing - **🎯 High Accuracy**: Powered by OpenAI's CLIP-ViT-B-32 model - **🎨 Modern UI**: Clean, responsive Gradio interface ## 🚀 How It Works 1. **First Visit**: The app automatically downloads 500 images from Caltech101 dataset 2. **Embedding Generation**: Creates CLIP embeddings for all images using ViT-B-32 model 3. **Index Building**: Builds FAISS index for fast similarity search 4. **Ready to Search**: Use text descriptions or upload images to find similar content ## 🔧 Technology Stack - **CLIP-ViT-B-32**: OpenAI's vision-language model - **FAISS**: Facebook's similarity search library - **Gradio**: Interactive web interface - **Caltech101**: 500 diverse images across 101 categories ## 📊 Dataset - **Source**: Caltech101 via HuggingFace - **Size**: 500 randomly sampled images - **Categories**: 101 different object classes - **Auto-Setup**: Downloads and processes on first run ## 💡 Usage Tips - **Text Search**: Use descriptive phrases like "red car on road" or "cat sitting" - **Image Search**: Upload any image to find visually similar ones - **Results**: Adjust the number of results using the slider (1-20) - **First Load**: May take 5-10 minutes to set up dataset initially *Note: First-time setup may take several minutes as the app downloads and processes the image dataset.*