File size: 2,218 Bytes
a966ee3
 
 
 
 
 
 
 
 
 
 
 
 
06fbd49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a966ee3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: Rag As A Service
emoji: πŸ“‰
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Minimal RAG API with MiniLM embeddings and FAISS
---

# RAG API (Minimal) β€” MiniLM + FAISS (Gradio)

Minimal Retrieval-Augmented Generation (RAG) service built with:

- **Sentence-Transformers MiniLM** for embeddings  
- **FAISS** for vector search (cosine similarity)  
- **Gradio** for both UI and API exposure  

---

## Features

- Ingest documents (one per line) with configurable chunk size/overlap  
- Query top-K relevant chunks with similarity search  
- Get concise answers composed from retrieved context  
- Reset index at any time  
- Call endpoints via **UI or API** (`/api/ingest`, `/api/answer`, `/api/reset`)  

---

## Quick Start

1. **Load sample docs β†’ Ingest β†’ Ask a query** using the Gradio UI.  
2. Programmatic access:

## ```bash

## Ingest

curl -s -X POST https://<your-space>.hf.space/api/ingest \
  -H "content-type: application/json" \
  -d '{"data": ["PySpark scales ETL across clusters.\nFAISS powers fast vector similarity search used in retrieval.", 256, 32]}'

# Answer

curl -s -X POST https://<your-space>.hf.space/api/answer \
  -H "content-type: application/json" \
  -d '{"data": ["What does FAISS do?", 5, 1000]}'

## Python Client

from gradio_client import Client
client = Client("https://<your-space>.hf.space")
status, size = client.predict("FAISS powers fast vector search.", 256, 32, api_name="/ingest")
res = client.predict("What does FAISS do?", 5, 1000, api_name="/answer")
print(res["answer"])

## Tech Stack

- Embeddings: sentence-transformers/all-MiniLM-L6-v2 (384-dim)

- Vector DB: FAISS (FlatIP index, normalized vectors)

- UI & API: Gradio Blocks

## Notes

- In-memory index only; resets when Space sleeps.

- For persistence, extend with save/load to ./data/.

- Demo-focused β€” fast, light, minimal surface.

## Author/Developer: Naga Adithya Kaushik (GenAIDevTOProd) 
## Utilized AI CoPilot for development purpose : Yes (minimal) - Debug, test cases, experimentation only



Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference