--- title: Enhanced RISC-V RAG emoji: πŸš€ colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.46.0 app_file: app.py pinned: false license: mit tags: - rag - nlp - risc-v - technical-documentation - graph-enhancement - neural-reranking - vector-search - document-processing - hybrid-search - cross-encoder short_description: Advanced RAG system for RISC-V documentation --- # Enhanced RISC-V RAG An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies. ## πŸš€ Technical Features Implemented ### **🧠 Neural Reranking** - Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring - HuggingFace API integration for cloud deployment - Adaptive strategies based on query type detection - Performance caching for repeated queries - Score fusion with configurable weights ### **πŸ•ΈοΈ Graph Enhancement** - Document relationship extraction using spaCy NER - NetworkX-based graph construction and analysis - Graph-aware retrieval scoring with connectivity metrics - Entity-based document linking for knowledge discovery - Relationship mapping for technical concepts ### **πŸ” Hybrid Search** - BM25 sparse retrieval for keyword matching - Dense vector search with FAISS/Weaviate backends - Score-aware fusion strategy with configurable weights - Composite filtering for result quality - Multi-stage retrieval pipeline ## πŸ—οΈ Architecture & Components ### **6-Component Modular System** 1. **Document Processor**: PyMuPDF parser with technical content cleaning 2. **Embedder**: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization 3. **Retriever**: Unified interface supporting FAISS/Weaviate backends 4. **Generator**: HuggingFace Inference API / Ollama integration 5. **Query Processor**: NLP analysis and query enhancement 6. **Platform Orchestrator**: Component lifecycle and health management ### **Advanced Capabilities** - **Multi-Backend Support**: Seamless switching between FAISS and Weaviate - **Performance Optimization**: Caching, batch processing, lazy loading - **Cloud Deployment**: HuggingFace Spaces optimized with smart caching - **Database Persistence**: SQLite storage for processed documents - **Real-time Analytics**: Query performance tracking and monitoring ## πŸ“‹ Prerequisites ### Required Dependencies - Python 3.11+ - PyTorch 2.0+ (with MPS support for Apple Silicon) - 4GB+ RAM for basic operation - 8GB+ RAM for advanced features ### Optional Dependencies - Ollama (for local LLM inference) - Docker (for containerized deployment) - CUDA GPU (for accelerated inference) ## πŸ› οΈ Installation ### 1. Clone the Repository ```bash git clone https://github.com/yourusername/enhanced-rag-demo.git cd enhanced-rag-demo ``` ### 2. Create Virtual Environment ```bash conda create -n enhanced-rag python=3.11 conda activate enhanced-rag ``` ### 3. Install Dependencies ```bash pip install -r requirements.txt ``` ### 4. Install Ollama (Optional - for Production LLM) The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama: #### macOS/Linux ```bash curl https://ollama.ai/install.sh | sh ``` #### Windows Download and install from: https://ollama.ai/download/windows #### Pull Required Model ```bash ollama pull llama3.2:3b ``` #### Verify Installation ```bash ollama list # Should show llama3.2:3b in the list ``` ## πŸ§ͺ Testing Without Ollama The system includes a MockLLMAdapter that allows running tests without external dependencies: ```bash # Run tests with mock adapter python test_mock_adapter.py # Use mock configuration for testing python tests/run_comprehensive_tests.py config/test_mock_default.yaml ``` ## πŸš€ Quick Start ### 1. Basic Usage (with Mock LLM) ```python from src.core.platform_orchestrator import PlatformOrchestrator # Initialize with mock configuration for testing orchestrator = PlatformOrchestrator("config/test_mock_default.yaml") # Process a query result = orchestrator.process_query("What is RISC-V?") print(f"Answer: {result.answer}") print(f"Confidence: {result.confidence}") ``` ### 2. Production Usage (with Ollama) ```python # Initialize with production configuration orchestrator = PlatformOrchestrator("config/default.yaml") # Index documents orchestrator.index_documents("data/documents/") # Process queries result = orchestrator.process_query("Explain RISC-V pipeline architecture") ``` ### 3. Advanced Features ```python # Use advanced configuration with neural reranking and graph enhancement orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml") # Process query with advanced features result = orchestrator.process_query("Explain RISC-V pipeline architecture") # Advanced features include: # - Neural reranking: Cross-encoder model for precision improvement # - Graph enhancement: Document relationship analysis # - Performance optimization: Caching and batch processing # - Advanced analytics: Real-time performance monitoring print(f"Answer: {result.answer}") print(f"Confidence: {result.confidence}") print(f"Sources: {result.sources}") ``` ### 4. Configuration Comparison ```python # Basic Configuration basic_orchestrator = PlatformOrchestrator("config/default.yaml") # - Standard fusion strategy # - Basic retrieval pipeline # Advanced Configuration advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml") # - Graph-enhanced fusion # - Neural reranking # - Performance optimization # API Configuration (cloud deployment) api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml") # - HuggingFace API integration # - Memory-optimized for cloud deployment ``` ## πŸ“ Configuration ### Configuration Files - `config/default.yaml` - Basic RAG configuration - `config/advanced_test.yaml` - Epic 2 features enabled - `config/test_mock_default.yaml` - Testing without Ollama - `config/epic2_hf_api.yaml` - HuggingFace API deployment ### Key Configuration Options ```yaml # Answer Generator Configuration answer_generator: type: "adaptive_modular" config: # For Ollama (production) llm_client: type: "ollama" config: model_name: "llama3.2:3b" base_url: "http://localhost:11434" # For testing (no external dependencies) llm_client: type: "mock" config: response_pattern: "technical" include_citations: true ``` ## 🐳 Docker Deployment ```bash # Build Docker image docker-compose build # Run with Docker docker-compose up ``` ## πŸ“Š System Capabilities ### **Technical Implementation** - **Document Processing**: Multi-format parsing with metadata extraction - **Embedding Generation**: Batch optimization with hardware acceleration - **Retrieval Pipeline**: Multi-stage hybrid search with reranking - **Answer Generation**: Multiple LLM backend support - **Architecture**: 6-component modular design ### **Supported Features** - **Query Processing**: Intent detection and enhancement - **Result Fusion**: Multiple scoring strategies - **Knowledge Graphs**: Entity extraction and relationship mapping - **Performance Monitoring**: Real-time analytics and metrics - **Cloud Deployment**: Optimized for containerized environments ## πŸ§ͺ Running Tests ```bash # Run all tests (requires Ollama or uses mock) python tests/run_comprehensive_tests.py # Run with mock adapter only python tests/run_comprehensive_tests.py config/test_mock_default.yaml # Run specific test suites python tests/diagnostic/run_all_diagnostics.py python tests/epic2_validation/run_epic2_comprehensive_tests.py ``` ## 🌐 Deployment Options ### **πŸš€ HuggingFace Spaces Deployment (Recommended)** The system is optimized for HuggingFace Spaces with automatic environment detection: 1. **Create New Space**: Create a new Streamlit app on [HuggingFace Spaces](https://huggingface.co/spaces) 2. **Upload Files**: Upload the following files to your space: ``` app.py # Main entry point (HF Spaces optimized) streamlit_epic2_demo.py # Epic 2 demo application requirements.txt # HF-optimized dependencies config/ # Configuration files src/ # Core system ``` 3. **Set Environment Variables** (in Space settings): ```bash HF_TOKEN=your_huggingface_token_here # For API access ``` 4. **Automatic Configuration**: The app automatically detects: - HuggingFace Spaces environment - Available API tokens - Memory constraints - Recommends optimal configuration **Features in HF Spaces:** - πŸš€ Full advanced RAG capabilities (neural reranking, graph enhancement) - πŸ”§ Automatic environment detection and configuration - πŸ’Ύ Memory-optimized dependencies for cloud deployment - 🌐 Global accessibility with zero setup required ### **πŸ’» Local Development** For full local capabilities with Ollama: ```bash # Install Ollama and model brew install ollama ollama pull llama3.2:3b # Run Epic 2 demo streamlit run app.py ``` ### **🐳 Docker Deployment** ```bash # Build and run with Docker docker-compose up ``` ## πŸ”§ Troubleshooting ### "Model 'llama3.2' not found" - **Cause**: Ollama not installed or model not pulled - **Solution**: Follow Ollama installation steps above or use mock configuration ### "Connection refused on localhost:11434" - **Cause**: Ollama service not running - **Solution**: Start Ollama with `ollama serve` ### High Memory Usage - **Cause**: Large models loaded in memory - **Solution**: Use smaller models or increase system RAM ### Tests Failing - **Cause**: Missing dependencies or Ollama not running - **Solution**: Use test_mock configurations or install Ollama ## πŸ“š Documentation & Testing ### **System Documentation** - [Technical Implementation](SCORE_COMPRESSION_FIX_COMPLETE_VALIDATION.md) - Technical analysis and testing - [Architecture Overview](docs/architecture/MASTER-ARCHITECTURE.md) - System design and components - [Component Documentation](docs/architecture/components/) - Individual component specifications - [Test Documentation](docs/test/) - Testing framework and validation ### **Key Technical Implementations** 1. **Score Fusion Optimization**: Advanced fusion strategy for multi-stage retrieval 2. **Neural Reranking**: Cross-encoder integration for relevance improvement 3. **System Integration**: Complete modular architecture with health monitoring 4. **Cloud Deployment**: HuggingFace Spaces optimized with automated configuration ## 🀝 Contributing 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Run tests to ensure quality 4. Commit your changes (`git commit -m 'Add amazing feature'`) 5. Push to the branch (`git push origin feature/amazing-feature`) 6. Open a Pull Request ## πŸ“„ License This project is licensed under the MIT License - see the LICENSE file for details. ## 🎯 Technical Highlights This RAG system demonstrates: ### **Advanced RAG Techniques** - **Neural Reranking**: Cross-encoder models for relevance scoring - **Graph Enhancement**: Document relationship analysis with NetworkX - **Multi-Backend Support**: FAISS and Weaviate vector store integration - **Performance Optimization**: Caching, batch processing, and lazy loading ### **Modern ML Engineering** - **Modular Architecture**: 6-component system with clear interfaces - **Cloud-First Design**: HuggingFace Spaces optimized deployment - **Comprehensive Testing**: Multiple test configurations and validation - **Developer Experience**: Easy setup with multiple deployment options ## πŸ™ Acknowledgments - **Open Source Libraries**: Built on PyTorch, HuggingFace, FAISS, and spaCy - **Transformer Models**: Leveraging state-of-the-art sentence transformers - **Cloud Platforms**: Optimized for HuggingFace Spaces deployment - **RISC-V Community**: Focus on technical documentation use case --- ## πŸš€ Quick Start Summary **HuggingFace Spaces (Recommended)**: Upload `app.py`, set `HF_TOKEN`, deploy **Local Development**: `pip install -r requirements.txt`, `ollama pull llama3.2:3b`, `streamlit run app.py` **Advanced Features**: Neural reranking, graph enhancement, and multi-backend support