---
title: RAG Pipeline For LLMs
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
tags:
  - rag
  - question-answering
  - nlp
  - faiss
  - transformers
  - wikipedia
  - semantic-search
  - huggingface
  - sentence-transformers
models:
  - sentence-transformers/all-mpnet-base-v2
  - deepset/roberta-base-squad2
datasets:
  - wikipedia
---

# 🔍 RAG Pipeline For LLMs 🚀

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Mehardeep7/rag-pipeline-llm)
[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://python.org)

## 📖 Project Overview

An intelligent **Retrieval-Augmented Generation (RAG)** pipeline that combines semantic search with question-answering capabilities. This system fetches Wikipedia articles, processes them into searchable chunks, and uses state-of-the-art AI models to provide accurate, context-aware answers.

## ✨ Key Features

- 📚 **Dynamic Knowledge Retrieval** from Wikipedia with error handling
- 🧮 **Semantic Search** using sentence transformers (no keyword dependency)
- ⚡ **Fast Vector Similarity** with FAISS indexing (sub-second search)
- 🤖 **Intelligent Answer Generation** using pre-trained QA models
- 📊 **Confidence Scoring** for answer quality assessment
- 🎛️ **Customizable Parameters** (chunk size, retrieval count, overlap)
- ✂️ **Smart Text Chunking** with overlapping segments for context preservation

## 🏗️ Architecture

```
User Query → Embedding → FAISS Search → Retrieve Chunks → QA Model → Answer + Confidence
```

## 🤖 AI Models Used

- **📏 Text Chunking**: `sentence-transformers/all-mpnet-base-v2` tokenizer
- **🧮 Vector Embeddings**: `sentence-transformers/all-mpnet-base-v2` (768-dimensional)
- **❓ Question Answering**: `deepset/roberta-base-squad2` (RoBERTa fine-tuned on SQuAD 2.0)
- **🔍 Vector Search**: FAISS IndexFlatL2 for L2 distance similarity

## 🚀 How to Use

1. **📖 Process Article**: Enter any Wikipedia topic and configure chunk settings
2. **❓ Ask Questions**: Switch to Q&A tab and enter your questions
3. **📊 View Results**: Explore answers with confidence scores and similarity metrics
4. **🔍 Analyze**: Check retrieved context and visualization analytics

## 💡 Example Usage

```
Topic: "Artificial Intelligence"
Question: "What is machine learning?"
Answer: "Machine learning is a subset of artificial intelligence..."
Confidence: 89.7%
```

## 🔧 Configuration Options

- **Chunk Size**: 128-512 tokens (default: 256)
- **Overlap**: 10-50 tokens (default: 20)
- **Retrieval Count**: 1-10 chunks (default: 3)

## 📊 Performance

- **Search Speed**: Sub-second retrieval for 1000+ chunks
- **Accuracy**: High precision with confidence scoring
- **Memory Efficient**: Optimized chunk sizes prevent token overflow

## 🔗 Links

- **📝 Full Project**: [GitHub Repository](https://github.com/Mehardeep79/RAG_Pipeline_LLM)
- **📓 Jupyter Notebook**: Complete implementation with explanations
- **🌐 Streamlit App**: Alternative web interface

## 🤝 Credits

Built with ❤️ using:
- 🤗 **Hugging Face** for transformers and model hosting
- ⚡ **FAISS** for efficient vector search
- 🎨 **Gradio** for the interactive interface
- 📖 **Wikipedia API** for knowledge content

---

**⭐ If you find this useful, please give it a star on GitHub!**