enhanced-rag-demo / README.md
Arthur Passuello
Shortened short descriptioN
9b322de
---
title: Enhanced RISC-V RAG
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.46.0
app_file: app.py
pinned: false
license: mit
tags:
- rag
- nlp
- risc-v
- technical-documentation
- graph-enhancement
- neural-reranking
- vector-search
- document-processing
- hybrid-search
- cross-encoder
short_description: Advanced RAG system for RISC-V documentation
---
# Enhanced RISC-V RAG
An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies.
## πŸš€ Technical Features Implemented
### **🧠 Neural Reranking**
- Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring
- HuggingFace API integration for cloud deployment
- Adaptive strategies based on query type detection
- Performance caching for repeated queries
- Score fusion with configurable weights
### **πŸ•ΈοΈ Graph Enhancement**
- Document relationship extraction using spaCy NER
- NetworkX-based graph construction and analysis
- Graph-aware retrieval scoring with connectivity metrics
- Entity-based document linking for knowledge discovery
- Relationship mapping for technical concepts
### **πŸ” Hybrid Search**
- BM25 sparse retrieval for keyword matching
- Dense vector search with FAISS/Weaviate backends
- Score-aware fusion strategy with configurable weights
- Composite filtering for result quality
- Multi-stage retrieval pipeline
## πŸ—οΈ Architecture & Components
### **6-Component Modular System**
1. **Document Processor**: PyMuPDF parser with technical content cleaning
2. **Embedder**: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization
3. **Retriever**: Unified interface supporting FAISS/Weaviate backends
4. **Generator**: HuggingFace Inference API / Ollama integration
5. **Query Processor**: NLP analysis and query enhancement
6. **Platform Orchestrator**: Component lifecycle and health management
### **Advanced Capabilities**
- **Multi-Backend Support**: Seamless switching between FAISS and Weaviate
- **Performance Optimization**: Caching, batch processing, lazy loading
- **Cloud Deployment**: HuggingFace Spaces optimized with smart caching
- **Database Persistence**: SQLite storage for processed documents
- **Real-time Analytics**: Query performance tracking and monitoring
## πŸ“‹ Prerequisites
### Required Dependencies
- Python 3.11+
- PyTorch 2.0+ (with MPS support for Apple Silicon)
- 4GB+ RAM for basic operation
- 8GB+ RAM for advanced features
### Optional Dependencies
- Ollama (for local LLM inference)
- Docker (for containerized deployment)
- CUDA GPU (for accelerated inference)
## πŸ› οΈ Installation
### 1. Clone the Repository
```bash
git clone https://github.com/yourusername/enhanced-rag-demo.git
cd enhanced-rag-demo
```
### 2. Create Virtual Environment
```bash
conda create -n enhanced-rag python=3.11
conda activate enhanced-rag
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
### 4. Install Ollama (Optional - for Production LLM)
The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama:
#### macOS/Linux
```bash
curl https://ollama.ai/install.sh | sh
```
#### Windows
Download and install from: https://ollama.ai/download/windows
#### Pull Required Model
```bash
ollama pull llama3.2:3b
```
#### Verify Installation
```bash
ollama list
# Should show llama3.2:3b in the list
```
## πŸ§ͺ Testing Without Ollama
The system includes a MockLLMAdapter that allows running tests without external dependencies:
```bash
# Run tests with mock adapter
python test_mock_adapter.py
# Use mock configuration for testing
python tests/run_comprehensive_tests.py config/test_mock_default.yaml
```
## πŸš€ Quick Start
### 1. Basic Usage (with Mock LLM)
```python
from src.core.platform_orchestrator import PlatformOrchestrator
# Initialize with mock configuration for testing
orchestrator = PlatformOrchestrator("config/test_mock_default.yaml")
# Process a query
result = orchestrator.process_query("What is RISC-V?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
```
### 2. Production Usage (with Ollama)
```python
# Initialize with production configuration
orchestrator = PlatformOrchestrator("config/default.yaml")
# Index documents
orchestrator.index_documents("data/documents/")
# Process queries
result = orchestrator.process_query("Explain RISC-V pipeline architecture")
```
### 3. Advanced Features
```python
# Use advanced configuration with neural reranking and graph enhancement
orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
# Process query with advanced features
result = orchestrator.process_query("Explain RISC-V pipeline architecture")
# Advanced features include:
# - Neural reranking: Cross-encoder model for precision improvement
# - Graph enhancement: Document relationship analysis
# - Performance optimization: Caching and batch processing
# - Advanced analytics: Real-time performance monitoring
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
print(f"Sources: {result.sources}")
```
### 4. Configuration Comparison
```python
# Basic Configuration
basic_orchestrator = PlatformOrchestrator("config/default.yaml")
# - Standard fusion strategy
# - Basic retrieval pipeline
# Advanced Configuration
advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
# - Graph-enhanced fusion
# - Neural reranking
# - Performance optimization
# API Configuration (cloud deployment)
api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml")
# - HuggingFace API integration
# - Memory-optimized for cloud deployment
```
## πŸ“ Configuration
### Configuration Files
- `config/default.yaml` - Basic RAG configuration
- `config/advanced_test.yaml` - Epic 2 features enabled
- `config/test_mock_default.yaml` - Testing without Ollama
- `config/epic2_hf_api.yaml` - HuggingFace API deployment
### Key Configuration Options
```yaml
# Answer Generator Configuration
answer_generator:
type: "adaptive_modular"
config:
# For Ollama (production)
llm_client:
type: "ollama"
config:
model_name: "llama3.2:3b"
base_url: "http://localhost:11434"
# For testing (no external dependencies)
llm_client:
type: "mock"
config:
response_pattern: "technical"
include_citations: true
```
## 🐳 Docker Deployment
```bash
# Build Docker image
docker-compose build
# Run with Docker
docker-compose up
```
## πŸ“Š System Capabilities
### **Technical Implementation**
- **Document Processing**: Multi-format parsing with metadata extraction
- **Embedding Generation**: Batch optimization with hardware acceleration
- **Retrieval Pipeline**: Multi-stage hybrid search with reranking
- **Answer Generation**: Multiple LLM backend support
- **Architecture**: 6-component modular design
### **Supported Features**
- **Query Processing**: Intent detection and enhancement
- **Result Fusion**: Multiple scoring strategies
- **Knowledge Graphs**: Entity extraction and relationship mapping
- **Performance Monitoring**: Real-time analytics and metrics
- **Cloud Deployment**: Optimized for containerized environments
## πŸ§ͺ Running Tests
```bash
# Run all tests (requires Ollama or uses mock)
python tests/run_comprehensive_tests.py
# Run with mock adapter only
python tests/run_comprehensive_tests.py config/test_mock_default.yaml
# Run specific test suites
python tests/diagnostic/run_all_diagnostics.py
python tests/epic2_validation/run_epic2_comprehensive_tests.py
```
## 🌐 Deployment Options
### **πŸš€ HuggingFace Spaces Deployment (Recommended)**
The system is optimized for HuggingFace Spaces with automatic environment detection:
1. **Create New Space**: Create a new Streamlit app on [HuggingFace Spaces](https://huggingface.co/spaces)
2. **Upload Files**: Upload the following files to your space:
```
app.py # Main entry point (HF Spaces optimized)
streamlit_epic2_demo.py # Epic 2 demo application
requirements.txt # HF-optimized dependencies
config/ # Configuration files
src/ # Core system
```
3. **Set Environment Variables** (in Space settings):
```bash
HF_TOKEN=your_huggingface_token_here # For API access
```
4. **Automatic Configuration**: The app automatically detects:
- HuggingFace Spaces environment
- Available API tokens
- Memory constraints
- Recommends optimal configuration
**Features in HF Spaces:**
- πŸš€ Full advanced RAG capabilities (neural reranking, graph enhancement)
- πŸ”§ Automatic environment detection and configuration
- πŸ’Ύ Memory-optimized dependencies for cloud deployment
- 🌐 Global accessibility with zero setup required
### **πŸ’» Local Development**
For full local capabilities with Ollama:
```bash
# Install Ollama and model
brew install ollama
ollama pull llama3.2:3b
# Run Epic 2 demo
streamlit run app.py
```
### **🐳 Docker Deployment**
```bash
# Build and run with Docker
docker-compose up
```
## πŸ”§ Troubleshooting
### "Model 'llama3.2' not found"
- **Cause**: Ollama not installed or model not pulled
- **Solution**: Follow Ollama installation steps above or use mock configuration
### "Connection refused on localhost:11434"
- **Cause**: Ollama service not running
- **Solution**: Start Ollama with `ollama serve`
### High Memory Usage
- **Cause**: Large models loaded in memory
- **Solution**: Use smaller models or increase system RAM
### Tests Failing
- **Cause**: Missing dependencies or Ollama not running
- **Solution**: Use test_mock configurations or install Ollama
## πŸ“š Documentation & Testing
### **System Documentation**
- [Technical Implementation](SCORE_COMPRESSION_FIX_COMPLETE_VALIDATION.md) - Technical analysis and testing
- [Architecture Overview](docs/architecture/MASTER-ARCHITECTURE.md) - System design and components
- [Component Documentation](docs/architecture/components/) - Individual component specifications
- [Test Documentation](docs/test/) - Testing framework and validation
### **Key Technical Implementations**
1. **Score Fusion Optimization**: Advanced fusion strategy for multi-stage retrieval
2. **Neural Reranking**: Cross-encoder integration for relevance improvement
3. **System Integration**: Complete modular architecture with health monitoring
4. **Cloud Deployment**: HuggingFace Spaces optimized with automated configuration
## 🀝 Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Run tests to ensure quality
4. Commit your changes (`git commit -m 'Add amazing feature'`)
5. Push to the branch (`git push origin feature/amazing-feature`)
6. Open a Pull Request
## πŸ“„ License
This project is licensed under the MIT License - see the LICENSE file for details.
## 🎯 Technical Highlights
This RAG system demonstrates:
### **Advanced RAG Techniques**
- **Neural Reranking**: Cross-encoder models for relevance scoring
- **Graph Enhancement**: Document relationship analysis with NetworkX
- **Multi-Backend Support**: FAISS and Weaviate vector store integration
- **Performance Optimization**: Caching, batch processing, and lazy loading
### **Modern ML Engineering**
- **Modular Architecture**: 6-component system with clear interfaces
- **Cloud-First Design**: HuggingFace Spaces optimized deployment
- **Comprehensive Testing**: Multiple test configurations and validation
- **Developer Experience**: Easy setup with multiple deployment options
## πŸ™ Acknowledgments
- **Open Source Libraries**: Built on PyTorch, HuggingFace, FAISS, and spaCy
- **Transformer Models**: Leveraging state-of-the-art sentence transformers
- **Cloud Platforms**: Optimized for HuggingFace Spaces deployment
- **RISC-V Community**: Focus on technical documentation use case
---
## πŸš€ Quick Start Summary
**HuggingFace Spaces (Recommended)**: Upload `app.py`, set `HF_TOKEN`, deploy
**Local Development**: `pip install -r requirements.txt`, `ollama pull llama3.2:3b`, `streamlit run app.py`
**Advanced Features**: Neural reranking, graph enhancement, and multi-backend support