bsoupy
/

RAGExplo

Image-Text-to-Text

Model card Files Files and versions Community

RAGExplo / README.md

bsoupy's picture

Update README.md

49910fa verified 3 months ago

|

history blame contribute delete

1.07 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- openai/clip-vit-base-patch32
	- sentence-transformers/all-MiniLM-L6-v2
	- google-t5/t5-small
	pipeline_tag: image-text-to-text
	library_name: transformers
	---
	library_name: transformerstags: - image-to-text - clip - t5 - sentence-transformers - ragpipeline_tag: image-to-textlicense: apache-2.0
	RAG Image Captioning Model
	This is a RAG-based image captioning model using CLIP (openai/clip-vit-base-patch32), T5 (t5-small), and SentenceTransformer (all-MiniLM-L6-v2). It retrieves similar captions from a FAISS index and generates a caption using T5.
	Files

	inference.py: Custom inference script with a predict function.
	requirements.txt: Dependencies.
	faiss_index.idx: FAISS index for retrieval.
	captions.json: Caption corpus.

	Usage
	Upload an image to generate a caption. Designed for API integration via Hugging Face Spaces or custom deployment.
	Setup
	Install dependencies from requirements.txt and ensure en_core_web_sm is installed for spaCy:
	pip install -r requirements.txt
	python -m spacy download en_core_web_sm