Spaces:

ArthyP
/

enhanced-rag-demo

Running

App Files Files Community

enhanced-rag-demo / README.md

Arthur Passuello

Shortened short descriptioN

9b322de about 1 month ago

preview code

raw

history blame contribute delete

12.5 kB

	---
	title: Enhanced RISC-V RAG
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.46.0
	app_file: app.py
	pinned: false
	license: mit
	tags:
	- rag
	- nlp
	- risc-v
	- technical-documentation
	- graph-enhancement
	- neural-reranking
	- vector-search
	- document-processing
	- hybrid-search
	- cross-encoder
	short_description: Advanced RAG system for RISC-V documentation
	---

	# Enhanced RISC-V RAG

	An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies.

	## 🚀 Technical Features Implemented

	### 🧠 Neural Reranking
	- Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring
	- HuggingFace API integration for cloud deployment
	- Adaptive strategies based on query type detection
	- Performance caching for repeated queries
	- Score fusion with configurable weights

	### 🕸️ Graph Enhancement
	- Document relationship extraction using spaCy NER
	- NetworkX-based graph construction and analysis
	- Graph-aware retrieval scoring with connectivity metrics
	- Entity-based document linking for knowledge discovery
	- Relationship mapping for technical concepts

	### 🔍 Hybrid Search
	- BM25 sparse retrieval for keyword matching
	- Dense vector search with FAISS/Weaviate backends
	- Score-aware fusion strategy with configurable weights
	- Composite filtering for result quality
	- Multi-stage retrieval pipeline

	## 🏗️ Architecture & Components

	### 6-Component Modular System
	1. Document Processor: PyMuPDF parser with technical content cleaning
	2. Embedder: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization
	3. Retriever: Unified interface supporting FAISS/Weaviate backends
	4. Generator: HuggingFace Inference API / Ollama integration
	5. Query Processor: NLP analysis and query enhancement
	6. Platform Orchestrator: Component lifecycle and health management

	### Advanced Capabilities
	- Multi-Backend Support: Seamless switching between FAISS and Weaviate
	- Performance Optimization: Caching, batch processing, lazy loading
	- Cloud Deployment: HuggingFace Spaces optimized with smart caching
	- Database Persistence: SQLite storage for processed documents
	- Real-time Analytics: Query performance tracking and monitoring

	## 📋 Prerequisites

	### Required Dependencies
	- Python 3.11+
	- PyTorch 2.0+ (with MPS support for Apple Silicon)
	- 4GB+ RAM for basic operation
	- 8GB+ RAM for advanced features

	### Optional Dependencies
	- Ollama (for local LLM inference)
	- Docker (for containerized deployment)
	- CUDA GPU (for accelerated inference)

	## 🛠️ Installation

	### 1. Clone the Repository
	```bash
	git clone https://github.com/yourusername/enhanced-rag-demo.git
	cd enhanced-rag-demo
	```

	### 2. Create Virtual Environment
	```bash
	conda create -n enhanced-rag python=3.11
	conda activate enhanced-rag
	```

	### 3. Install Dependencies
	```bash
	pip install -r requirements.txt
	```

	### 4. Install Ollama (Optional - for Production LLM)

	The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama:

	#### macOS/Linux
	```bash
	curl https://ollama.ai/install.sh \| sh
	```

	#### Windows
	Download and install from: https://ollama.ai/download/windows

	#### Pull Required Model
	```bash
	ollama pull llama3.2:3b
	```

	#### Verify Installation
	```bash
	ollama list
	# Should show llama3.2:3b in the list
	```

	## 🧪 Testing Without Ollama

	The system includes a MockLLMAdapter that allows running tests without external dependencies:

	```bash
	# Run tests with mock adapter
	python test_mock_adapter.py

	# Use mock configuration for testing
	python tests/run_comprehensive_tests.py config/test_mock_default.yaml
	```

	## 🚀 Quick Start

	### 1. Basic Usage (with Mock LLM)
	```python
	from src.core.platform_orchestrator import PlatformOrchestrator

	# Initialize with mock configuration for testing
	orchestrator = PlatformOrchestrator("config/test_mock_default.yaml")

	# Process a query
	result = orchestrator.process_query("What is RISC-V?")
	print(f"Answer: {result.answer}")
	print(f"Confidence: {result.confidence}")
	```

	### 2. Production Usage (with Ollama)
	```python
	# Initialize with production configuration
	orchestrator = PlatformOrchestrator("config/default.yaml")

	# Index documents
	orchestrator.index_documents("data/documents/")

	# Process queries
	result = orchestrator.process_query("Explain RISC-V pipeline architecture")
	```

	### 3. Advanced Features
	```python
	# Use advanced configuration with neural reranking and graph enhancement
	orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")

	# Process query with advanced features
	result = orchestrator.process_query("Explain RISC-V pipeline architecture")

	# Advanced features include:
	# - Neural reranking: Cross-encoder model for precision improvement
	# - Graph enhancement: Document relationship analysis
	# - Performance optimization: Caching and batch processing
	# - Advanced analytics: Real-time performance monitoring

	print(f"Answer: {result.answer}")
	print(f"Confidence: {result.confidence}")
	print(f"Sources: {result.sources}")
	```

	### 4. Configuration Comparison
	```python
	# Basic Configuration
	basic_orchestrator = PlatformOrchestrator("config/default.yaml")
	# - Standard fusion strategy
	# - Basic retrieval pipeline

	# Advanced Configuration
	advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml")
	# - Graph-enhanced fusion
	# - Neural reranking
	# - Performance optimization

	# API Configuration (cloud deployment)
	api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml")
	# - HuggingFace API integration
	# - Memory-optimized for cloud deployment
	```

	## 📁 Configuration

	### Configuration Files

	- `config/default.yaml` - Basic RAG configuration
	- `config/advanced_test.yaml` - Epic 2 features enabled
	- `config/test_mock_default.yaml` - Testing without Ollama
	- `config/epic2_hf_api.yaml` - HuggingFace API deployment

	### Key Configuration Options

	```yaml
	# Answer Generator Configuration
	answer_generator:
	type: "adaptive_modular"
	config:
	# For Ollama (production)
	llm_client:
	type: "ollama"
	config:
	model_name: "llama3.2:3b"
	base_url: "http://localhost:11434"

	# For testing (no external dependencies)
	llm_client:
	type: "mock"
	config:
	response_pattern: "technical"
	include_citations: true
	```

	## 🐳 Docker Deployment

	```bash
	# Build Docker image
	docker-compose build

	# Run with Docker
	docker-compose up
	```

	## 📊 System Capabilities

	### Technical Implementation
	- Document Processing: Multi-format parsing with metadata extraction
	- Embedding Generation: Batch optimization with hardware acceleration
	- Retrieval Pipeline: Multi-stage hybrid search with reranking
	- Answer Generation: Multiple LLM backend support
	- Architecture: 6-component modular design

	### Supported Features
	- Query Processing: Intent detection and enhancement
	- Result Fusion: Multiple scoring strategies
	- Knowledge Graphs: Entity extraction and relationship mapping
	- Performance Monitoring: Real-time analytics and metrics
	- Cloud Deployment: Optimized for containerized environments

	## 🧪 Running Tests

	```bash
	# Run all tests (requires Ollama or uses mock)
	python tests/run_comprehensive_tests.py

	# Run with mock adapter only
	python tests/run_comprehensive_tests.py config/test_mock_default.yaml

	# Run specific test suites
	python tests/diagnostic/run_all_diagnostics.py
	python tests/epic2_validation/run_epic2_comprehensive_tests.py
	```

	## 🌐 Deployment Options

	### 🚀 HuggingFace Spaces Deployment (Recommended)

	The system is optimized for HuggingFace Spaces with automatic environment detection:

	1. Create New Space: Create a new Streamlit app on [HuggingFace Spaces](https://huggingface.co/spaces)

	2. Upload Files: Upload the following files to your space:
	```
	app.py # Main entry point (HF Spaces optimized)
	streamlit_epic2_demo.py # Epic 2 demo application
	requirements.txt # HF-optimized dependencies
	config/ # Configuration files
	src/ # Core system
	```

	3. Set Environment Variables (in Space settings):
	```bash
	HF_TOKEN=your_huggingface_token_here # For API access
	```

	4. Automatic Configuration: The app automatically detects:
	- HuggingFace Spaces environment
	- Available API tokens
	- Memory constraints
	- Recommends optimal configuration

	Features in HF Spaces:
	- 🚀 Full advanced RAG capabilities (neural reranking, graph enhancement)
	- 🔧 Automatic environment detection and configuration
	- 💾 Memory-optimized dependencies for cloud deployment
	- 🌐 Global accessibility with zero setup required

	### 💻 Local Development

	For full local capabilities with Ollama:

	```bash
	# Install Ollama and model
	brew install ollama
	ollama pull llama3.2:3b

	# Run Epic 2 demo
	streamlit run app.py
	```

	### 🐳 Docker Deployment

	```bash
	# Build and run with Docker
	docker-compose up
	```

	## 🔧 Troubleshooting

	### "Model 'llama3.2' not found"
	- Cause: Ollama not installed or model not pulled
	- Solution: Follow Ollama installation steps above or use mock configuration

	### "Connection refused on localhost:11434"
	- Cause: Ollama service not running
	- Solution: Start Ollama with `ollama serve`

	### High Memory Usage
	- Cause: Large models loaded in memory
	- Solution: Use smaller models or increase system RAM

	### Tests Failing
	- Cause: Missing dependencies or Ollama not running
	- Solution: Use test_mock configurations or install Ollama

	## 📚 Documentation & Testing

	### System Documentation
	- [Technical Implementation](SCORE_COMPRESSION_FIX_COMPLETE_VALIDATION.md) - Technical analysis and testing
	- [Architecture Overview](docs/architecture/MASTER-ARCHITECTURE.md) - System design and components
	- [Component Documentation](docs/architecture/components/) - Individual component specifications
	- [Test Documentation](docs/test/) - Testing framework and validation

	### Key Technical Implementations
	1. Score Fusion Optimization: Advanced fusion strategy for multi-stage retrieval
	2. Neural Reranking: Cross-encoder integration for relevance improvement
	3. System Integration: Complete modular architecture with health monitoring
	4. Cloud Deployment: HuggingFace Spaces optimized with automated configuration

	## 🤝 Contributing

	1. Fork the repository
	2. Create your feature branch (`git checkout -b feature/amazing-feature`)
	3. Run tests to ensure quality
	4. Commit your changes (`git commit -m 'Add amazing feature'`)
	5. Push to the branch (`git push origin feature/amazing-feature`)
	6. Open a Pull Request

	## 📄 License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## 🎯 Technical Highlights

	This RAG system demonstrates:

	### Advanced RAG Techniques
	- Neural Reranking: Cross-encoder models for relevance scoring
	- Graph Enhancement: Document relationship analysis with NetworkX
	- Multi-Backend Support: FAISS and Weaviate vector store integration
	- Performance Optimization: Caching, batch processing, and lazy loading

	### Modern ML Engineering
	- Modular Architecture: 6-component system with clear interfaces
	- Cloud-First Design: HuggingFace Spaces optimized deployment
	- Comprehensive Testing: Multiple test configurations and validation
	- Developer Experience: Easy setup with multiple deployment options

	## 🙏 Acknowledgments

	- Open Source Libraries: Built on PyTorch, HuggingFace, FAISS, and spaCy
	- Transformer Models: Leveraging state-of-the-art sentence transformers
	- Cloud Platforms: Optimized for HuggingFace Spaces deployment
	- RISC-V Community: Focus on technical documentation use case

	---

	## 🚀 Quick Start Summary

	HuggingFace Spaces (Recommended): Upload `app.py`, set `HF_TOKEN`, deploy
	Local Development: `pip install -r requirements.txt`, `ollama pull llama3.2:3b`, `streamlit run app.py`
	Advanced Features: Neural reranking, graph enhancement, and multi-backend support