|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- openai/clip-vit-base-patch32 |
|
- sentence-transformers/all-MiniLM-L6-v2 |
|
- google-t5/t5-small |
|
pipeline_tag: image-text-to-text |
|
library_name: transformers |
|
--- |
|
library_name: transformerstags: - image-to-text - clip - t5 - sentence-transformers - ragpipeline_tag: image-to-textlicense: apache-2.0 |
|
RAG Image Captioning Model |
|
This is a RAG-based image captioning model using CLIP (openai/clip-vit-base-patch32), T5 (t5-small), and SentenceTransformer (all-MiniLM-L6-v2). It retrieves similar captions from a FAISS index and generates a caption using T5. |
|
Files |
|
|
|
inference.py: Custom inference script with a predict function. |
|
requirements.txt: Dependencies. |
|
faiss_index.idx: FAISS index for retrieval. |
|
captions.json: Caption corpus. |
|
|
|
Usage |
|
Upload an image to generate a caption. Designed for API integration via Hugging Face Spaces or custom deployment. |
|
Setup |
|
Install dependencies from requirements.txt and ensure en_core_web_sm is installed for spaCy: |
|
pip install -r requirements.txt |
|
python -m spacy download en_core_web_sm |