File size: 1,073 Bytes

---
license: apache-2.0
language:
- en
base_model:
- openai/clip-vit-base-patch32
- sentence-transformers/all-MiniLM-L6-v2
- google-t5/t5-small
pipeline_tag: image-text-to-text
library_name: transformers
---
library_name: transformerstags:  - image-to-text  - clip  - t5  - sentence-transformers  - ragpipeline_tag: image-to-textlicense: apache-2.0
RAG Image Captioning Model
This is a RAG-based image captioning model using CLIP (openai/clip-vit-base-patch32), T5 (t5-small), and SentenceTransformer (all-MiniLM-L6-v2). It retrieves similar captions from a FAISS index and generates a caption using T5.
Files

inference.py: Custom inference script with a predict function.
requirements.txt: Dependencies.
faiss_index.idx: FAISS index for retrieval.
captions.json: Caption corpus.

Usage
Upload an image to generate a caption. Designed for API integration via Hugging Face Spaces or custom deployment.
Setup
Install dependencies from requirements.txt and ensure en_core_web_sm is installed for spaCy:
pip install -r requirements.txt
python -m spacy download en_core_web_sm