enhanced-rag-demo / README.md
Arthur Passuello
Shortened short descriptioN
9b322de

A newer version of the Streamlit SDK is available: 1.49.1

Upgrade
metadata
title: Enhanced RISC-V RAG
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.46.0
app_file: app.py
pinned: false
license: mit
tags:
  - rag
  - nlp
  - risc-v
  - technical-documentation
  - graph-enhancement
  - neural-reranking
  - vector-search
  - document-processing
  - hybrid-search
  - cross-encoder
short_description: Advanced RAG system for RISC-V documentation

Enhanced RISC-V RAG

An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies.

πŸš€ Technical Features Implemented

🧠 Neural Reranking

  • Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring
  • HuggingFace API integration for cloud deployment
  • Adaptive strategies based on query type detection
  • Performance caching for repeated queries
  • Score fusion with configurable weights

πŸ•ΈοΈ Graph Enhancement

  • Document relationship extraction using spaCy NER
  • NetworkX-based graph construction and analysis
  • Graph-aware retrieval scoring with connectivity metrics
  • Entity-based document linking for knowledge discovery
  • Relationship mapping for technical concepts

πŸ” Hybrid Search

  • BM25 sparse retrieval for keyword matching
  • Dense vector search with FAISS/Weaviate backends
  • Score-aware fusion strategy with configurable weights
  • Composite filtering for result quality
  • Multi-stage retrieval pipeline

πŸ—οΈ Architecture & Components

6-Component Modular System

  1. Document Processor: PyMuPDF parser with technical content cleaning
  2. Embedder: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization
  3. Retriever: Unified interface supporting FAISS/Weaviate backends
  4. Generator: HuggingFace Inference API / Ollama integration
  5. Query Processor: NLP analysis and query enhancement
  6. Platform Orchestrator: Component lifecycle and health management

Advanced Capabilities

  • Multi-Backend Support: Seamless switching between FAISS and Weaviate
  • Performance Optimization: Caching, batch processing, lazy loading
  • Cloud Deployment: HuggingFace Spaces optimized with smart caching
  • Database Persistence: SQLite storage for processed documents
  • Real-time Analytics: Query performance tracking and monitoring

πŸ“‹ Prerequisites

Required Dependencies

  • Python 3.11+
  • PyTorch 2.0+ (with MPS support for Apple Silicon)
  • 4GB+ RAM for basic operation
  • 8GB+ RAM for advanced features

Optional Dependencies

  • Ollama (for local LLM inference)
  • Docker (for containerized deployment)
  • CUDA GPU (for accelerated inference)

πŸ› οΈ Installation

1. Clone the Repository

git clone https://github.com/yourusername/enhanced-rag-demo.git
cd enhanced-rag-demo

2. Create Virtual Environment

conda create -n enhanced-rag python=3.11
conda activate enhanced-rag

3. Install Dependencies

pip install -r requirements.txt

4. Install Ollama (Optional - for Production LLM)

The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama:

macOS/Linux

curl https://ollama.ai/install.sh | sh

Windows

Download and install from: https://ollama.ai/download/windows

Pull Required Model

ollama pull llama3.2:3b

Verify Installation

ollama list
# Should show llama3.2:3b in the list

πŸ§ͺ Testing Without Ollama

The system includes a MockLLMAdapter that allows running tests without external dependencies:

# Run tests with mock adapter
python test_mock_adapter.py

# Use mock configuration for testing
python tests/run_comprehensive_tests.py config/test_mock_default.yaml

πŸš€ Quick Start

1. Basic Usage (with Mock LLM)

from src.core.platform_orchestrator import PlatformOrchestrator

# Initialize with mock configuration for testing
orchestrator = PlatformOrchestrator("config/test_mock_default.yaml")

# Process a query
result = orchestrator.process_query("What is RISC-V?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")

2. Production Usage (with Ollama)

# Initialize with production configuration
orchestrator = PlatformOrchestrator("config/default.yaml")

# Index documents
orchestrator.index_documents("data/documents/")

# Process queries
result = orchestrator.process_query("Explain RISC-V pipeline architecture")

3. Advanced Features

# Use advanced configuration with neural reranking and graph enhancement
orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")

# Process query with advanced features
result = orchestrator.process_query("Explain RISC-V pipeline architecture")

# Advanced features include:
# - Neural reranking: Cross-encoder model for precision improvement
# - Graph enhancement: Document relationship analysis
# - Performance optimization: Caching and batch processing
# - Advanced analytics: Real-time performance monitoring

print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
print(f"Sources: {result.sources}")

4. Configuration Comparison

# Basic Configuration
basic_orchestrator = PlatformOrchestrator("config/default.yaml")
# - Standard fusion strategy
# - Basic retrieval pipeline

# Advanced Configuration
advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
# - Graph-enhanced fusion
# - Neural reranking
# - Performance optimization

# API Configuration (cloud deployment)
api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml") 
# - HuggingFace API integration
# - Memory-optimized for cloud deployment

πŸ“ Configuration

Configuration Files

  • config/default.yaml - Basic RAG configuration
  • config/advanced_test.yaml - Epic 2 features enabled
  • config/test_mock_default.yaml - Testing without Ollama
  • config/epic2_hf_api.yaml - HuggingFace API deployment

Key Configuration Options

# Answer Generator Configuration
answer_generator:
  type: "adaptive_modular"
  config:
    # For Ollama (production)
    llm_client:
      type: "ollama"
      config:
        model_name: "llama3.2:3b"
        base_url: "http://localhost:11434"
    
    # For testing (no external dependencies)
    llm_client:
      type: "mock"
      config:
        response_pattern: "technical"
        include_citations: true

🐳 Docker Deployment

# Build Docker image
docker-compose build

# Run with Docker
docker-compose up

πŸ“Š System Capabilities

Technical Implementation

  • Document Processing: Multi-format parsing with metadata extraction
  • Embedding Generation: Batch optimization with hardware acceleration
  • Retrieval Pipeline: Multi-stage hybrid search with reranking
  • Answer Generation: Multiple LLM backend support
  • Architecture: 6-component modular design

Supported Features

  • Query Processing: Intent detection and enhancement
  • Result Fusion: Multiple scoring strategies
  • Knowledge Graphs: Entity extraction and relationship mapping
  • Performance Monitoring: Real-time analytics and metrics
  • Cloud Deployment: Optimized for containerized environments

πŸ§ͺ Running Tests

# Run all tests (requires Ollama or uses mock)
python tests/run_comprehensive_tests.py

# Run with mock adapter only
python tests/run_comprehensive_tests.py config/test_mock_default.yaml

# Run specific test suites
python tests/diagnostic/run_all_diagnostics.py
python tests/epic2_validation/run_epic2_comprehensive_tests.py

🌐 Deployment Options

πŸš€ HuggingFace Spaces Deployment (Recommended)

The system is optimized for HuggingFace Spaces with automatic environment detection:

  1. Create New Space: Create a new Streamlit app on HuggingFace Spaces

  2. Upload Files: Upload the following files to your space:

    app.py                    # Main entry point (HF Spaces optimized)
    streamlit_epic2_demo.py   # Epic 2 demo application
    requirements.txt          # HF-optimized dependencies
    config/                   # Configuration files
    src/                      # Core system
    
  3. Set Environment Variables (in Space settings):

    HF_TOKEN=your_huggingface_token_here  # For API access
    
  4. Automatic Configuration: The app automatically detects:

    • HuggingFace Spaces environment
    • Available API tokens
    • Memory constraints
    • Recommends optimal configuration

Features in HF Spaces:

  • πŸš€ Full advanced RAG capabilities (neural reranking, graph enhancement)
  • πŸ”§ Automatic environment detection and configuration
  • πŸ’Ύ Memory-optimized dependencies for cloud deployment
  • 🌐 Global accessibility with zero setup required

πŸ’» Local Development

For full local capabilities with Ollama:

# Install Ollama and model
brew install ollama
ollama pull llama3.2:3b

# Run Epic 2 demo
streamlit run app.py

🐳 Docker Deployment

# Build and run with Docker
docker-compose up

πŸ”§ Troubleshooting

"Model 'llama3.2' not found"

  • Cause: Ollama not installed or model not pulled
  • Solution: Follow Ollama installation steps above or use mock configuration

"Connection refused on localhost:11434"

  • Cause: Ollama service not running
  • Solution: Start Ollama with ollama serve

High Memory Usage

  • Cause: Large models loaded in memory
  • Solution: Use smaller models or increase system RAM

Tests Failing

  • Cause: Missing dependencies or Ollama not running
  • Solution: Use test_mock configurations or install Ollama

πŸ“š Documentation & Testing

System Documentation

Key Technical Implementations

  1. Score Fusion Optimization: Advanced fusion strategy for multi-stage retrieval
  2. Neural Reranking: Cross-encoder integration for relevance improvement
  3. System Integration: Complete modular architecture with health monitoring
  4. Cloud Deployment: HuggingFace Spaces optimized with automated configuration

🀝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run tests to ensure quality
  4. Commit your changes (git commit -m 'Add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Technical Highlights

This RAG system demonstrates:

Advanced RAG Techniques

  • Neural Reranking: Cross-encoder models for relevance scoring
  • Graph Enhancement: Document relationship analysis with NetworkX
  • Multi-Backend Support: FAISS and Weaviate vector store integration
  • Performance Optimization: Caching, batch processing, and lazy loading

Modern ML Engineering

  • Modular Architecture: 6-component system with clear interfaces
  • Cloud-First Design: HuggingFace Spaces optimized deployment
  • Comprehensive Testing: Multiple test configurations and validation
  • Developer Experience: Easy setup with multiple deployment options

πŸ™ Acknowledgments

  • Open Source Libraries: Built on PyTorch, HuggingFace, FAISS, and spaCy
  • Transformer Models: Leveraging state-of-the-art sentence transformers
  • Cloud Platforms: Optimized for HuggingFace Spaces deployment
  • RISC-V Community: Focus on technical documentation use case

πŸš€ Quick Start Summary

HuggingFace Spaces (Recommended): Upload app.py, set HF_TOKEN, deploy
Local Development: pip install -r requirements.txt, ollama pull llama3.2:3b, streamlit run app.py
Advanced Features: Neural reranking, graph enhancement, and multi-backend support