---
title: Enhanced RISC-V RAG
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.46.0
app_file: app.py
pinned: false
license: mit
tags:
- rag
- nlp
- risc-v
- technical-documentation
- graph-enhancement
- neural-reranking
- vector-search
- document-processing
- hybrid-search
- cross-encoder
short_description: Advanced RAG system for RISC-V documentation
---

# Enhanced RISC-V RAG

An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies.

## 🚀 Technical Features Implemented

### **🧠 Neural Reranking**
- Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring
- HuggingFace API integration for cloud deployment
- Adaptive strategies based on query type detection
- Performance caching for repeated queries
- Score fusion with configurable weights

### **🕸️ Graph Enhancement**
- Document relationship extraction using spaCy NER
- NetworkX-based graph construction and analysis
- Graph-aware retrieval scoring with connectivity metrics
- Entity-based document linking for knowledge discovery
- Relationship mapping for technical concepts

### **🔍 Hybrid Search**
- BM25 sparse retrieval for keyword matching
- Dense vector search with FAISS/Weaviate backends
- Score-aware fusion strategy with configurable weights
- Composite filtering for result quality
- Multi-stage retrieval pipeline

## 🏗️ Architecture & Components

### **6-Component Modular System**
1. **Document Processor**: PyMuPDF parser with technical content cleaning
2. **Embedder**: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization
3. **Retriever**: Unified interface supporting FAISS/Weaviate backends
4. **Generator**: HuggingFace Inference API / Ollama integration
5. **Query Processor**: NLP analysis and query enhancement
6. **Platform Orchestrator**: Component lifecycle and health management

### **Advanced Capabilities**
- **Multi-Backend Support**: Seamless switching between FAISS and Weaviate
- **Performance Optimization**: Caching, batch processing, lazy loading
- **Cloud Deployment**: HuggingFace Spaces optimized with smart caching
- **Database Persistence**: SQLite storage for processed documents
- **Real-time Analytics**: Query performance tracking and monitoring

## 📋 Prerequisites

### Required Dependencies
- Python 3.11+
- PyTorch 2.0+ (with MPS support for Apple Silicon)
- 4GB+ RAM for basic operation
- 8GB+ RAM for advanced features

### Optional Dependencies
- Ollama (for local LLM inference)
- Docker (for containerized deployment)
- CUDA GPU (for accelerated inference)

## 🛠️ Installation

### 1. Clone the Repository
```bash
git clone https://github.com/yourusername/enhanced-rag-demo.git
cd enhanced-rag-demo
```

### 2. Create Virtual Environment
```bash
conda create -n enhanced-rag python=3.11
conda activate enhanced-rag
```

### 3. Install Dependencies
```bash
pip install -r requirements.txt
```

### 4. Install Ollama (Optional - for Production LLM)

The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama:

#### macOS/Linux
```bash
curl https://ollama.ai/install.sh | sh
```

#### Windows
Download and install from: https://ollama.ai/download/windows

#### Pull Required Model
```bash
ollama pull llama3.2:3b
```

#### Verify Installation
```bash
ollama list
# Should show llama3.2:3b in the list
```

## 🧪 Testing Without Ollama

The system includes a MockLLMAdapter that allows running tests without external dependencies:

```bash
# Run tests with mock adapter
python test_mock_adapter.py

# Use mock configuration for testing
python tests/run_comprehensive_tests.py config/test_mock_default.yaml
```

## 🚀 Quick Start

### 1. Basic Usage (with Mock LLM)
```python
from src.core.platform_orchestrator import PlatformOrchestrator

# Initialize with mock configuration for testing
orchestrator = PlatformOrchestrator("config/test_mock_default.yaml")

# Process a query
result = orchestrator.process_query("What is RISC-V?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
```

### 2. Production Usage (with Ollama)
```python
# Initialize with production configuration
orchestrator = PlatformOrchestrator("config/default.yaml")

# Index documents
orchestrator.index_documents("data/documents/")

# Process queries
result = orchestrator.process_query("Explain RISC-V pipeline architecture")
```

### 3. Advanced Features
```python
# Use advanced configuration with neural reranking and graph enhancement
orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")

# Process query with advanced features
result = orchestrator.process_query("Explain RISC-V pipeline architecture")

# Advanced features include:
# - Neural reranking: Cross-encoder model for precision improvement
# - Graph enhancement: Document relationship analysis
# - Performance optimization: Caching and batch processing
# - Advanced analytics: Real-time performance monitoring

print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
print(f"Sources: {result.sources}")
```

### 4. Configuration Comparison
```python
# Basic Configuration
basic_orchestrator = PlatformOrchestrator("config/default.yaml")
# - Standard fusion strategy
# - Basic retrieval pipeline

# Advanced Configuration
advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
# - Graph-enhanced fusion
# - Neural reranking
# - Performance optimization

# API Configuration (cloud deployment)
api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml") 
# - HuggingFace API integration
# - Memory-optimized for cloud deployment
```

## 📁 Configuration

### Configuration Files

- `config/default.yaml` - Basic RAG configuration
- `config/advanced_test.yaml` - Epic 2 features enabled
- `config/test_mock_default.yaml` - Testing without Ollama
- `config/epic2_hf_api.yaml` - HuggingFace API deployment

### Key Configuration Options

```yaml
# Answer Generator Configuration
answer_generator:
  type: "adaptive_modular"
  config:
    # For Ollama (production)
    llm_client:
      type: "ollama"
      config:
        model_name: "llama3.2:3b"
        base_url: "http://localhost:11434"
    
    # For testing (no external dependencies)
    llm_client:
      type: "mock"
      config:
        response_pattern: "technical"
        include_citations: true
```

## 🐳 Docker Deployment

```bash
# Build Docker image
docker-compose build

# Run with Docker
docker-compose up
```

## 📊 System Capabilities

### **Technical Implementation**
- **Document Processing**: Multi-format parsing with metadata extraction
- **Embedding Generation**: Batch optimization with hardware acceleration
- **Retrieval Pipeline**: Multi-stage hybrid search with reranking
- **Answer Generation**: Multiple LLM backend support
- **Architecture**: 6-component modular design

### **Supported Features**
- **Query Processing**: Intent detection and enhancement
- **Result Fusion**: Multiple scoring strategies
- **Knowledge Graphs**: Entity extraction and relationship mapping
- **Performance Monitoring**: Real-time analytics and metrics
- **Cloud Deployment**: Optimized for containerized environments

## 🧪 Running Tests

```bash
# Run all tests (requires Ollama or uses mock)
python tests/run_comprehensive_tests.py

# Run with mock adapter only
python tests/run_comprehensive_tests.py config/test_mock_default.yaml

# Run specific test suites
python tests/diagnostic/run_all_diagnostics.py
python tests/epic2_validation/run_epic2_comprehensive_tests.py
```

## 🌐 Deployment Options

### **🚀 HuggingFace Spaces Deployment (Recommended)**

The system is optimized for HuggingFace Spaces with automatic environment detection:

1. **Create New Space**: Create a new Streamlit app on [HuggingFace Spaces](https://huggingface.co/spaces)

2. **Upload Files**: Upload the following files to your space:
   ```
   app.py                    # Main entry point (HF Spaces optimized)
   streamlit_epic2_demo.py   # Epic 2 demo application
   requirements.txt          # HF-optimized dependencies
   config/                   # Configuration files
   src/                      # Core system
   ```

3. **Set Environment Variables** (in Space settings):
   ```bash
   HF_TOKEN=your_huggingface_token_here  # For API access
   ```

4. **Automatic Configuration**: The app automatically detects:
   - HuggingFace Spaces environment
   - Available API tokens
   - Memory constraints
   - Recommends optimal configuration

**Features in HF Spaces:**
- 🚀 Full advanced RAG capabilities (neural reranking, graph enhancement)
- 🔧 Automatic environment detection and configuration
- 💾 Memory-optimized dependencies for cloud deployment
- 🌐 Global accessibility with zero setup required

### **💻 Local Development**

For full local capabilities with Ollama:

```bash
# Install Ollama and model
brew install ollama
ollama pull llama3.2:3b

# Run Epic 2 demo
streamlit run app.py
```

### **🐳 Docker Deployment**

```bash
# Build and run with Docker
docker-compose up
```

## 🔧 Troubleshooting

### "Model 'llama3.2' not found"
- **Cause**: Ollama not installed or model not pulled
- **Solution**: Follow Ollama installation steps above or use mock configuration

### "Connection refused on localhost:11434"
- **Cause**: Ollama service not running
- **Solution**: Start Ollama with `ollama serve`

### High Memory Usage
- **Cause**: Large models loaded in memory
- **Solution**: Use smaller models or increase system RAM

### Tests Failing
- **Cause**: Missing dependencies or Ollama not running
- **Solution**: Use test_mock configurations or install Ollama

## 📚 Documentation & Testing

### **System Documentation**
- [Technical Implementation](SCORE_COMPRESSION_FIX_COMPLETE_VALIDATION.md) - Technical analysis and testing
- [Architecture Overview](docs/architecture/MASTER-ARCHITECTURE.md) - System design and components
- [Component Documentation](docs/architecture/components/) - Individual component specifications
- [Test Documentation](docs/test/) - Testing framework and validation

### **Key Technical Implementations**
1. **Score Fusion Optimization**: Advanced fusion strategy for multi-stage retrieval
2. **Neural Reranking**: Cross-encoder integration for relevance improvement
3. **System Integration**: Complete modular architecture with health monitoring
4. **Cloud Deployment**: HuggingFace Spaces optimized with automated configuration

## 🤝 Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Run tests to ensure quality
4. Commit your changes (`git commit -m 'Add amazing feature'`)
5. Push to the branch (`git push origin feature/amazing-feature`)
6. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🎯 Technical Highlights

This RAG system demonstrates:

### **Advanced RAG Techniques**
- **Neural Reranking**: Cross-encoder models for relevance scoring
- **Graph Enhancement**: Document relationship analysis with NetworkX
- **Multi-Backend Support**: FAISS and Weaviate vector store integration
- **Performance Optimization**: Caching, batch processing, and lazy loading

### **Modern ML Engineering**
- **Modular Architecture**: 6-component system with clear interfaces
- **Cloud-First Design**: HuggingFace Spaces optimized deployment
- **Comprehensive Testing**: Multiple test configurations and validation
- **Developer Experience**: Easy setup with multiple deployment options

## 🙏 Acknowledgments

- **Open Source Libraries**: Built on PyTorch, HuggingFace, FAISS, and spaCy
- **Transformer Models**: Leveraging state-of-the-art sentence transformers
- **Cloud Platforms**: Optimized for HuggingFace Spaces deployment
- **RISC-V Community**: Focus on technical documentation use case

---

## 🚀 Quick Start Summary

**HuggingFace Spaces (Recommended)**: Upload `app.py`, set `HF_TOKEN`, deploy  
**Local Development**: `pip install -r requirements.txt`, `ollama pull llama3.2:3b`, `streamlit run app.py`  
**Advanced Features**: Neural reranking, graph enhancement, and multi-backend support