metadata

title: Enhanced RISC-V RAG
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.46.0
app_file: app.py
pinned: false
license: mit
tags:
  - rag
  - nlp
  - risc-v
  - technical-documentation
  - graph-enhancement
  - neural-reranking
  - vector-search
  - document-processing
  - hybrid-search
  - cross-encoder
short_description: Advanced RAG system for RISC-V documentation

Enhanced RISC-V RAG

An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies.

🚀 Technical Features Implemented

🧠 Neural Reranking

Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring
HuggingFace API integration for cloud deployment
Adaptive strategies based on query type detection
Performance caching for repeated queries
Score fusion with configurable weights

🕸️ Graph Enhancement

Document relationship extraction using spaCy NER
NetworkX-based graph construction and analysis
Graph-aware retrieval scoring with connectivity metrics
Entity-based document linking for knowledge discovery
Relationship mapping for technical concepts

🔍 Hybrid Search

BM25 sparse retrieval for keyword matching
Dense vector search with FAISS/Weaviate backends
Score-aware fusion strategy with configurable weights
Composite filtering for result quality
Multi-stage retrieval pipeline

🏗️ Architecture & Components

6-Component Modular System

Document Processor: PyMuPDF parser with technical content cleaning
Embedder: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization
Retriever: Unified interface supporting FAISS/Weaviate backends
Generator: HuggingFace Inference API / Ollama integration
Query Processor: NLP analysis and query enhancement
Platform Orchestrator: Component lifecycle and health management

Advanced Capabilities

Multi-Backend Support: Seamless switching between FAISS and Weaviate
Performance Optimization: Caching, batch processing, lazy loading
Cloud Deployment: HuggingFace Spaces optimized with smart caching
Database Persistence: SQLite storage for processed documents
Real-time Analytics: Query performance tracking and monitoring

📋 Prerequisites

Required Dependencies

Python 3.11+
PyTorch 2.0+ (with MPS support for Apple Silicon)
4GB+ RAM for basic operation
8GB+ RAM for advanced features

Optional Dependencies

Ollama (for local LLM inference)
Docker (for containerized deployment)
CUDA GPU (for accelerated inference)

🛠️ Installation

1. Clone the Repository

git clone https://github.com/yourusername/enhanced-rag-demo.git
cd enhanced-rag-demo

2. Create Virtual Environment

conda create -n enhanced-rag python=3.11
conda activate enhanced-rag

3. Install Dependencies

pip install -r requirements.txt

4. Install Ollama (Optional - for Production LLM)

The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama:

macOS/Linux

curl https://ollama.ai/install.sh | sh

Windows

Download and install from: https://ollama.ai/download/windows

Pull Required Model

ollama pull llama3.2:3b

Verify Installation

ollama list
# Should show llama3.2:3b in the list

🧪 Testing Without Ollama

The system includes a MockLLMAdapter that allows running tests without external dependencies:

# Run tests with mock adapter
python test_mock_adapter.py

# Use mock configuration for testing
python tests/run_comprehensive_tests.py config/test_mock_default.yaml

🚀 Quick Start

1. Basic Usage (with Mock LLM)

from src.core.platform_orchestrator import PlatformOrchestrator

# Initialize with mock configuration for testing
orchestrator = PlatformOrchestrator("config/test_mock_default.yaml")

# Process a query
result = orchestrator.process_query("What is RISC-V?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")

2. Production Usage (with Ollama)

# Initialize with production configuration
orchestrator = PlatformOrchestrator("config/default.yaml")

# Index documents
orchestrator.index_documents("data/documents/")

# Process queries
result = orchestrator.process_query("Explain RISC-V pipeline architecture")

3. Advanced Features

# Use advanced configuration with neural reranking and graph enhancement
orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")

# Process query with advanced features
result = orchestrator.process_query("Explain RISC-V pipeline architecture")

# Advanced features include:
# - Neural reranking: Cross-encoder model for precision improvement
# - Graph enhancement: Document relationship analysis
# - Performance optimization: Caching and batch processing
# - Advanced analytics: Real-time performance monitoring

print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
print(f"Sources: {result.sources}")

4. Configuration Comparison

# Basic Configuration
basic_orchestrator = PlatformOrchestrator("config/default.yaml")
# - Standard fusion strategy
# - Basic retrieval pipeline

# Advanced Configuration
advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
# - Graph-enhanced fusion
# - Neural reranking
# - Performance optimization

# API Configuration (cloud deployment)
api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml") 
# - HuggingFace API integration
# - Memory-optimized for cloud deployment

📁 Configuration

Configuration Files

config/default.yaml - Basic RAG configuration
config/advanced_test.yaml - Epic 2 features enabled
config/test_mock_default.yaml - Testing without Ollama
config/epic2_hf_api.yaml - HuggingFace API deployment

Key Configuration Options

# Answer Generator Configuration
answer_generator:
  type: "adaptive_modular"
  config:
    # For Ollama (production)
    llm_client:
      type: "ollama"
      config:
        model_name: "llama3.2:3b"
        base_url: "http://localhost:11434"
    
    # For testing (no external dependencies)
    llm_client:
      type: "mock"
      config:
        response_pattern: "technical"
        include_citations: true

🐳 Docker Deployment

# Build Docker image
docker-compose build

# Run with Docker
docker-compose up

📊 System Capabilities

Technical Implementation

Document Processing: Multi-format parsing with metadata extraction
Embedding Generation: Batch optimization with hardware acceleration
Retrieval Pipeline: Multi-stage hybrid search with reranking
Answer Generation: Multiple LLM backend support
Architecture: 6-component modular design

Supported Features

Query Processing: Intent detection and enhancement
Result Fusion: Multiple scoring strategies
Knowledge Graphs: Entity extraction and relationship mapping
Performance Monitoring: Real-time analytics and metrics
Cloud Deployment: Optimized for containerized environments

🧪 Running Tests

# Run all tests (requires Ollama or uses mock)
python tests/run_comprehensive_tests.py

# Run with mock adapter only
python tests/run_comprehensive_tests.py config/test_mock_default.yaml

# Run specific test suites
python tests/diagnostic/run_all_diagnostics.py
python tests/epic2_validation/run_epic2_comprehensive_tests.py

🌐 Deployment Options

🚀 HuggingFace Spaces Deployment (Recommended)

The system is optimized for HuggingFace Spaces with automatic environment detection:

Create New Space: Create a new Streamlit app on HuggingFace Spaces

Upload Files: Upload the following files to your space:

app.py                    # Main entry point (HF Spaces optimized)
streamlit_epic2_demo.py   # Epic 2 demo application
requirements.txt          # HF-optimized dependencies
config/                   # Configuration files
src/                      # Core system

Set Environment Variables (in Space settings):

HF_TOKEN=your_huggingface_token_here  # For API access

Automatic Configuration: The app automatically detects:
- HuggingFace Spaces environment
- Available API tokens
- Memory constraints
- Recommends optimal configuration

Features in HF Spaces:

🚀 Full advanced RAG capabilities (neural reranking, graph enhancement)
🔧 Automatic environment detection and configuration
💾 Memory-optimized dependencies for cloud deployment
🌐 Global accessibility with zero setup required

💻 Local Development

For full local capabilities with Ollama:

# Install Ollama and model
brew install ollama
ollama pull llama3.2:3b

# Run Epic 2 demo
streamlit run app.py

🐳 Docker Deployment

# Build and run with Docker
docker-compose up

🔧 Troubleshooting

"Model 'llama3.2' not found"

Cause: Ollama not installed or model not pulled
Solution: Follow Ollama installation steps above or use mock configuration

"Connection refused on localhost:11434"

Cause: Ollama service not running
Solution: Start Ollama with ollama serve

High Memory Usage

Cause: Large models loaded in memory
Solution: Use smaller models or increase system RAM

Tests Failing

Cause: Missing dependencies or Ollama not running
Solution: Use test_mock configurations or install Ollama

📚 Documentation & Testing

System Documentation

Technical Implementation - Technical analysis and testing
Architecture Overview - System design and components
Component Documentation - Individual component specifications
Test Documentation - Testing framework and validation

Key Technical Implementations

Score Fusion Optimization: Advanced fusion strategy for multi-stage retrieval
Neural Reranking: Cross-encoder integration for relevance improvement
System Integration: Complete modular architecture with health monitoring
Cloud Deployment: HuggingFace Spaces optimized with automated configuration

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Run tests to ensure quality
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Technical Highlights

This RAG system demonstrates:

Advanced RAG Techniques

Neural Reranking: Cross-encoder models for relevance scoring
Graph Enhancement: Document relationship analysis with NetworkX
Multi-Backend Support: FAISS and Weaviate vector store integration
Performance Optimization: Caching, batch processing, and lazy loading

Modern ML Engineering

Modular Architecture: 6-component system with clear interfaces
Cloud-First Design: HuggingFace Spaces optimized deployment
Comprehensive Testing: Multiple test configurations and validation
Developer Experience: Easy setup with multiple deployment options

🙏 Acknowledgments

Open Source Libraries: Built on PyTorch, HuggingFace, FAISS, and spaCy
Transformer Models: Leveraging state-of-the-art sentence transformers
Cloud Platforms: Optimized for HuggingFace Spaces deployment
RISC-V Community: Focus on technical documentation use case

🚀 Quick Start Summary

HuggingFace Spaces (Recommended): Upload app.py, set HF_TOKEN, deploy
Local Development: pip install -r requirements.txt, ollama pull llama3.2:3b, streamlit run app.py
Advanced Features: Neural reranking, graph enhancement, and multi-backend support