Spaces:
Running
A newer version of the Streamlit SDK is available:
1.49.1
title: Enhanced RISC-V RAG
emoji: π
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.46.0
app_file: app.py
pinned: false
license: mit
tags:
- rag
- nlp
- risc-v
- technical-documentation
- graph-enhancement
- neural-reranking
- vector-search
- document-processing
- hybrid-search
- cross-encoder
short_description: Advanced RAG system for RISC-V documentation
Enhanced RISC-V RAG
An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies.
π Technical Features Implemented
π§ Neural Reranking
- Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring
- HuggingFace API integration for cloud deployment
- Adaptive strategies based on query type detection
- Performance caching for repeated queries
- Score fusion with configurable weights
πΈοΈ Graph Enhancement
- Document relationship extraction using spaCy NER
- NetworkX-based graph construction and analysis
- Graph-aware retrieval scoring with connectivity metrics
- Entity-based document linking for knowledge discovery
- Relationship mapping for technical concepts
π Hybrid Search
- BM25 sparse retrieval for keyword matching
- Dense vector search with FAISS/Weaviate backends
- Score-aware fusion strategy with configurable weights
- Composite filtering for result quality
- Multi-stage retrieval pipeline
ποΈ Architecture & Components
6-Component Modular System
- Document Processor: PyMuPDF parser with technical content cleaning
- Embedder: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization
- Retriever: Unified interface supporting FAISS/Weaviate backends
- Generator: HuggingFace Inference API / Ollama integration
- Query Processor: NLP analysis and query enhancement
- Platform Orchestrator: Component lifecycle and health management
Advanced Capabilities
- Multi-Backend Support: Seamless switching between FAISS and Weaviate
- Performance Optimization: Caching, batch processing, lazy loading
- Cloud Deployment: HuggingFace Spaces optimized with smart caching
- Database Persistence: SQLite storage for processed documents
- Real-time Analytics: Query performance tracking and monitoring
π Prerequisites
Required Dependencies
- Python 3.11+
- PyTorch 2.0+ (with MPS support for Apple Silicon)
- 4GB+ RAM for basic operation
- 8GB+ RAM for advanced features
Optional Dependencies
- Ollama (for local LLM inference)
- Docker (for containerized deployment)
- CUDA GPU (for accelerated inference)
π οΈ Installation
1. Clone the Repository
git clone https://github.com/yourusername/enhanced-rag-demo.git
cd enhanced-rag-demo
2. Create Virtual Environment
conda create -n enhanced-rag python=3.11
conda activate enhanced-rag
3. Install Dependencies
pip install -r requirements.txt
4. Install Ollama (Optional - for Production LLM)
The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama:
macOS/Linux
curl https://ollama.ai/install.sh | sh
Windows
Download and install from: https://ollama.ai/download/windows
Pull Required Model
ollama pull llama3.2:3b
Verify Installation
ollama list
# Should show llama3.2:3b in the list
π§ͺ Testing Without Ollama
The system includes a MockLLMAdapter that allows running tests without external dependencies:
# Run tests with mock adapter
python test_mock_adapter.py
# Use mock configuration for testing
python tests/run_comprehensive_tests.py config/test_mock_default.yaml
π Quick Start
1. Basic Usage (with Mock LLM)
from src.core.platform_orchestrator import PlatformOrchestrator
# Initialize with mock configuration for testing
orchestrator = PlatformOrchestrator("config/test_mock_default.yaml")
# Process a query
result = orchestrator.process_query("What is RISC-V?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
2. Production Usage (with Ollama)
# Initialize with production configuration
orchestrator = PlatformOrchestrator("config/default.yaml")
# Index documents
orchestrator.index_documents("data/documents/")
# Process queries
result = orchestrator.process_query("Explain RISC-V pipeline architecture")
3. Advanced Features
# Use advanced configuration with neural reranking and graph enhancement
orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
# Process query with advanced features
result = orchestrator.process_query("Explain RISC-V pipeline architecture")
# Advanced features include:
# - Neural reranking: Cross-encoder model for precision improvement
# - Graph enhancement: Document relationship analysis
# - Performance optimization: Caching and batch processing
# - Advanced analytics: Real-time performance monitoring
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
print(f"Sources: {result.sources}")
4. Configuration Comparison
# Basic Configuration
basic_orchestrator = PlatformOrchestrator("config/default.yaml")
# - Standard fusion strategy
# - Basic retrieval pipeline
# Advanced Configuration
advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
# - Graph-enhanced fusion
# - Neural reranking
# - Performance optimization
# API Configuration (cloud deployment)
api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml")
# - HuggingFace API integration
# - Memory-optimized for cloud deployment
π Configuration
Configuration Files
config/default.yaml
- Basic RAG configurationconfig/advanced_test.yaml
- Epic 2 features enabledconfig/test_mock_default.yaml
- Testing without Ollamaconfig/epic2_hf_api.yaml
- HuggingFace API deployment
Key Configuration Options
# Answer Generator Configuration
answer_generator:
type: "adaptive_modular"
config:
# For Ollama (production)
llm_client:
type: "ollama"
config:
model_name: "llama3.2:3b"
base_url: "http://localhost:11434"
# For testing (no external dependencies)
llm_client:
type: "mock"
config:
response_pattern: "technical"
include_citations: true
π³ Docker Deployment
# Build Docker image
docker-compose build
# Run with Docker
docker-compose up
π System Capabilities
Technical Implementation
- Document Processing: Multi-format parsing with metadata extraction
- Embedding Generation: Batch optimization with hardware acceleration
- Retrieval Pipeline: Multi-stage hybrid search with reranking
- Answer Generation: Multiple LLM backend support
- Architecture: 6-component modular design
Supported Features
- Query Processing: Intent detection and enhancement
- Result Fusion: Multiple scoring strategies
- Knowledge Graphs: Entity extraction and relationship mapping
- Performance Monitoring: Real-time analytics and metrics
- Cloud Deployment: Optimized for containerized environments
π§ͺ Running Tests
# Run all tests (requires Ollama or uses mock)
python tests/run_comprehensive_tests.py
# Run with mock adapter only
python tests/run_comprehensive_tests.py config/test_mock_default.yaml
# Run specific test suites
python tests/diagnostic/run_all_diagnostics.py
python tests/epic2_validation/run_epic2_comprehensive_tests.py
π Deployment Options
π HuggingFace Spaces Deployment (Recommended)
The system is optimized for HuggingFace Spaces with automatic environment detection:
Create New Space: Create a new Streamlit app on HuggingFace Spaces
Upload Files: Upload the following files to your space:
app.py # Main entry point (HF Spaces optimized) streamlit_epic2_demo.py # Epic 2 demo application requirements.txt # HF-optimized dependencies config/ # Configuration files src/ # Core system
Set Environment Variables (in Space settings):
HF_TOKEN=your_huggingface_token_here # For API access
Automatic Configuration: The app automatically detects:
- HuggingFace Spaces environment
- Available API tokens
- Memory constraints
- Recommends optimal configuration
Features in HF Spaces:
- π Full advanced RAG capabilities (neural reranking, graph enhancement)
- π§ Automatic environment detection and configuration
- πΎ Memory-optimized dependencies for cloud deployment
- π Global accessibility with zero setup required
π» Local Development
For full local capabilities with Ollama:
# Install Ollama and model
brew install ollama
ollama pull llama3.2:3b
# Run Epic 2 demo
streamlit run app.py
π³ Docker Deployment
# Build and run with Docker
docker-compose up
π§ Troubleshooting
"Model 'llama3.2' not found"
- Cause: Ollama not installed or model not pulled
- Solution: Follow Ollama installation steps above or use mock configuration
"Connection refused on localhost:11434"
- Cause: Ollama service not running
- Solution: Start Ollama with
ollama serve
High Memory Usage
- Cause: Large models loaded in memory
- Solution: Use smaller models or increase system RAM
Tests Failing
- Cause: Missing dependencies or Ollama not running
- Solution: Use test_mock configurations or install Ollama
π Documentation & Testing
System Documentation
- Technical Implementation - Technical analysis and testing
- Architecture Overview - System design and components
- Component Documentation - Individual component specifications
- Test Documentation - Testing framework and validation
Key Technical Implementations
- Score Fusion Optimization: Advanced fusion strategy for multi-stage retrieval
- Neural Reranking: Cross-encoder integration for relevance improvement
- System Integration: Complete modular architecture with health monitoring
- Cloud Deployment: HuggingFace Spaces optimized with automated configuration
π€ Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Run tests to ensure quality
- Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π― Technical Highlights
This RAG system demonstrates:
Advanced RAG Techniques
- Neural Reranking: Cross-encoder models for relevance scoring
- Graph Enhancement: Document relationship analysis with NetworkX
- Multi-Backend Support: FAISS and Weaviate vector store integration
- Performance Optimization: Caching, batch processing, and lazy loading
Modern ML Engineering
- Modular Architecture: 6-component system with clear interfaces
- Cloud-First Design: HuggingFace Spaces optimized deployment
- Comprehensive Testing: Multiple test configurations and validation
- Developer Experience: Easy setup with multiple deployment options
π Acknowledgments
- Open Source Libraries: Built on PyTorch, HuggingFace, FAISS, and spaCy
- Transformer Models: Leveraging state-of-the-art sentence transformers
- Cloud Platforms: Optimized for HuggingFace Spaces deployment
- RISC-V Community: Focus on technical documentation use case
π Quick Start Summary
HuggingFace Spaces (Recommended): Upload app.py
, set HF_TOKEN
, deploy
Local Development: pip install -r requirements.txt
, ollama pull llama3.2:3b
, streamlit run app.py
Advanced Features: Neural reranking, graph enhancement, and multi-backend support