Spaces:
Running
Running
title: Enhanced RISC-V RAG | |
emoji: π | |
colorFrom: blue | |
colorTo: purple | |
sdk: streamlit | |
sdk_version: 1.46.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
tags: | |
- rag | |
- nlp | |
- risc-v | |
- technical-documentation | |
- graph-enhancement | |
- neural-reranking | |
- vector-search | |
- document-processing | |
- hybrid-search | |
- cross-encoder | |
short_description: Advanced RAG system for RISC-V documentation | |
# Enhanced RISC-V RAG | |
An advanced Retrieval-Augmented Generation (RAG) system for RISC-V technical documentation featuring neural reranking, graph enhancement, hybrid search, and multi-backend support. Demonstrates modern RAG techniques including cross-encoder reranking, document relationship graphs, and score-aware fusion strategies. | |
## π Technical Features Implemented | |
### **π§ Neural Reranking** | |
- Cross-encoder models (ms-marco-MiniLM-L6-v2) for relevance scoring | |
- HuggingFace API integration for cloud deployment | |
- Adaptive strategies based on query type detection | |
- Performance caching for repeated queries | |
- Score fusion with configurable weights | |
### **πΈοΈ Graph Enhancement** | |
- Document relationship extraction using spaCy NER | |
- NetworkX-based graph construction and analysis | |
- Graph-aware retrieval scoring with connectivity metrics | |
- Entity-based document linking for knowledge discovery | |
- Relationship mapping for technical concepts | |
### **π Hybrid Search** | |
- BM25 sparse retrieval for keyword matching | |
- Dense vector search with FAISS/Weaviate backends | |
- Score-aware fusion strategy with configurable weights | |
- Composite filtering for result quality | |
- Multi-stage retrieval pipeline | |
## ποΈ Architecture & Components | |
### **6-Component Modular System** | |
1. **Document Processor**: PyMuPDF parser with technical content cleaning | |
2. **Embedder**: SentenceTransformer (multi-qa-MiniLM-L6-cos-v1) with batch optimization | |
3. **Retriever**: Unified interface supporting FAISS/Weaviate backends | |
4. **Generator**: HuggingFace Inference API / Ollama integration | |
5. **Query Processor**: NLP analysis and query enhancement | |
6. **Platform Orchestrator**: Component lifecycle and health management | |
### **Advanced Capabilities** | |
- **Multi-Backend Support**: Seamless switching between FAISS and Weaviate | |
- **Performance Optimization**: Caching, batch processing, lazy loading | |
- **Cloud Deployment**: HuggingFace Spaces optimized with smart caching | |
- **Database Persistence**: SQLite storage for processed documents | |
- **Real-time Analytics**: Query performance tracking and monitoring | |
## π Prerequisites | |
### Required Dependencies | |
- Python 3.11+ | |
- PyTorch 2.0+ (with MPS support for Apple Silicon) | |
- 4GB+ RAM for basic operation | |
- 8GB+ RAM for advanced features | |
### Optional Dependencies | |
- Ollama (for local LLM inference) | |
- Docker (for containerized deployment) | |
- CUDA GPU (for accelerated inference) | |
## π οΈ Installation | |
### 1. Clone the Repository | |
```bash | |
git clone https://github.com/yourusername/enhanced-rag-demo.git | |
cd enhanced-rag-demo | |
``` | |
### 2. Create Virtual Environment | |
```bash | |
conda create -n enhanced-rag python=3.11 | |
conda activate enhanced-rag | |
``` | |
### 3. Install Dependencies | |
```bash | |
pip install -r requirements.txt | |
``` | |
### 4. Install Ollama (Optional - for Production LLM) | |
The system includes a MockLLMAdapter for testing without external dependencies. For production use with real LLM inference, install Ollama: | |
#### macOS/Linux | |
```bash | |
curl https://ollama.ai/install.sh | sh | |
``` | |
#### Windows | |
Download and install from: https://ollama.ai/download/windows | |
#### Pull Required Model | |
```bash | |
ollama pull llama3.2:3b | |
``` | |
#### Verify Installation | |
```bash | |
ollama list | |
# Should show llama3.2:3b in the list | |
``` | |
## π§ͺ Testing Without Ollama | |
The system includes a MockLLMAdapter that allows running tests without external dependencies: | |
```bash | |
# Run tests with mock adapter | |
python test_mock_adapter.py | |
# Use mock configuration for testing | |
python tests/run_comprehensive_tests.py config/test_mock_default.yaml | |
``` | |
## π Quick Start | |
### 1. Basic Usage (with Mock LLM) | |
```python | |
from src.core.platform_orchestrator import PlatformOrchestrator | |
# Initialize with mock configuration for testing | |
orchestrator = PlatformOrchestrator("config/test_mock_default.yaml") | |
# Process a query | |
result = orchestrator.process_query("What is RISC-V?") | |
print(f"Answer: {result.answer}") | |
print(f"Confidence: {result.confidence}") | |
``` | |
### 2. Production Usage (with Ollama) | |
```python | |
# Initialize with production configuration | |
orchestrator = PlatformOrchestrator("config/default.yaml") | |
# Index documents | |
orchestrator.index_documents("data/documents/") | |
# Process queries | |
result = orchestrator.process_query("Explain RISC-V pipeline architecture") | |
``` | |
### 3. Advanced Features | |
```python | |
# Use advanced configuration with neural reranking and graph enhancement | |
orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml") | |
# Process query with advanced features | |
result = orchestrator.process_query("Explain RISC-V pipeline architecture") | |
# Advanced features include: | |
# - Neural reranking: Cross-encoder model for precision improvement | |
# - Graph enhancement: Document relationship analysis | |
# - Performance optimization: Caching and batch processing | |
# - Advanced analytics: Real-time performance monitoring | |
print(f"Answer: {result.answer}") | |
print(f"Confidence: {result.confidence}") | |
print(f"Sources: {result.sources}") | |
``` | |
### 4. Configuration Comparison | |
```python | |
# Basic Configuration | |
basic_orchestrator = PlatformOrchestrator("config/default.yaml") | |
# - Standard fusion strategy | |
# - Basic retrieval pipeline | |
# Advanced Configuration | |
advanced_orchestrator = PlatformOrchestrator("config/epic2_graph_calibrated.yaml") | |
# - Graph-enhanced fusion | |
# - Neural reranking | |
# - Performance optimization | |
# API Configuration (cloud deployment) | |
api_orchestrator = PlatformOrchestrator("config/epic2_hf_api.yaml") | |
# - HuggingFace API integration | |
# - Memory-optimized for cloud deployment | |
``` | |
## π Configuration | |
### Configuration Files | |
- `config/default.yaml` - Basic RAG configuration | |
- `config/advanced_test.yaml` - Epic 2 features enabled | |
- `config/test_mock_default.yaml` - Testing without Ollama | |
- `config/epic2_hf_api.yaml` - HuggingFace API deployment | |
### Key Configuration Options | |
```yaml | |
# Answer Generator Configuration | |
answer_generator: | |
type: "adaptive_modular" | |
config: | |
# For Ollama (production) | |
llm_client: | |
type: "ollama" | |
config: | |
model_name: "llama3.2:3b" | |
base_url: "http://localhost:11434" | |
# For testing (no external dependencies) | |
llm_client: | |
type: "mock" | |
config: | |
response_pattern: "technical" | |
include_citations: true | |
``` | |
## π³ Docker Deployment | |
```bash | |
# Build Docker image | |
docker-compose build | |
# Run with Docker | |
docker-compose up | |
``` | |
## π System Capabilities | |
### **Technical Implementation** | |
- **Document Processing**: Multi-format parsing with metadata extraction | |
- **Embedding Generation**: Batch optimization with hardware acceleration | |
- **Retrieval Pipeline**: Multi-stage hybrid search with reranking | |
- **Answer Generation**: Multiple LLM backend support | |
- **Architecture**: 6-component modular design | |
### **Supported Features** | |
- **Query Processing**: Intent detection and enhancement | |
- **Result Fusion**: Multiple scoring strategies | |
- **Knowledge Graphs**: Entity extraction and relationship mapping | |
- **Performance Monitoring**: Real-time analytics and metrics | |
- **Cloud Deployment**: Optimized for containerized environments | |
## π§ͺ Running Tests | |
```bash | |
# Run all tests (requires Ollama or uses mock) | |
python tests/run_comprehensive_tests.py | |
# Run with mock adapter only | |
python tests/run_comprehensive_tests.py config/test_mock_default.yaml | |
# Run specific test suites | |
python tests/diagnostic/run_all_diagnostics.py | |
python tests/epic2_validation/run_epic2_comprehensive_tests.py | |
``` | |
## π Deployment Options | |
### **π HuggingFace Spaces Deployment (Recommended)** | |
The system is optimized for HuggingFace Spaces with automatic environment detection: | |
1. **Create New Space**: Create a new Streamlit app on [HuggingFace Spaces](https://huggingface.co/spaces) | |
2. **Upload Files**: Upload the following files to your space: | |
``` | |
app.py # Main entry point (HF Spaces optimized) | |
streamlit_epic2_demo.py # Epic 2 demo application | |
requirements.txt # HF-optimized dependencies | |
config/ # Configuration files | |
src/ # Core system | |
``` | |
3. **Set Environment Variables** (in Space settings): | |
```bash | |
HF_TOKEN=your_huggingface_token_here # For API access | |
``` | |
4. **Automatic Configuration**: The app automatically detects: | |
- HuggingFace Spaces environment | |
- Available API tokens | |
- Memory constraints | |
- Recommends optimal configuration | |
**Features in HF Spaces:** | |
- π Full advanced RAG capabilities (neural reranking, graph enhancement) | |
- π§ Automatic environment detection and configuration | |
- πΎ Memory-optimized dependencies for cloud deployment | |
- π Global accessibility with zero setup required | |
### **π» Local Development** | |
For full local capabilities with Ollama: | |
```bash | |
# Install Ollama and model | |
brew install ollama | |
ollama pull llama3.2:3b | |
# Run Epic 2 demo | |
streamlit run app.py | |
``` | |
### **π³ Docker Deployment** | |
```bash | |
# Build and run with Docker | |
docker-compose up | |
``` | |
## π§ Troubleshooting | |
### "Model 'llama3.2' not found" | |
- **Cause**: Ollama not installed or model not pulled | |
- **Solution**: Follow Ollama installation steps above or use mock configuration | |
### "Connection refused on localhost:11434" | |
- **Cause**: Ollama service not running | |
- **Solution**: Start Ollama with `ollama serve` | |
### High Memory Usage | |
- **Cause**: Large models loaded in memory | |
- **Solution**: Use smaller models or increase system RAM | |
### Tests Failing | |
- **Cause**: Missing dependencies or Ollama not running | |
- **Solution**: Use test_mock configurations or install Ollama | |
## π Documentation & Testing | |
### **System Documentation** | |
- [Technical Implementation](SCORE_COMPRESSION_FIX_COMPLETE_VALIDATION.md) - Technical analysis and testing | |
- [Architecture Overview](docs/architecture/MASTER-ARCHITECTURE.md) - System design and components | |
- [Component Documentation](docs/architecture/components/) - Individual component specifications | |
- [Test Documentation](docs/test/) - Testing framework and validation | |
### **Key Technical Implementations** | |
1. **Score Fusion Optimization**: Advanced fusion strategy for multi-stage retrieval | |
2. **Neural Reranking**: Cross-encoder integration for relevance improvement | |
3. **System Integration**: Complete modular architecture with health monitoring | |
4. **Cloud Deployment**: HuggingFace Spaces optimized with automated configuration | |
## π€ Contributing | |
1. Fork the repository | |
2. Create your feature branch (`git checkout -b feature/amazing-feature`) | |
3. Run tests to ensure quality | |
4. Commit your changes (`git commit -m 'Add amazing feature'`) | |
5. Push to the branch (`git push origin feature/amazing-feature`) | |
6. Open a Pull Request | |
## π License | |
This project is licensed under the MIT License - see the LICENSE file for details. | |
## π― Technical Highlights | |
This RAG system demonstrates: | |
### **Advanced RAG Techniques** | |
- **Neural Reranking**: Cross-encoder models for relevance scoring | |
- **Graph Enhancement**: Document relationship analysis with NetworkX | |
- **Multi-Backend Support**: FAISS and Weaviate vector store integration | |
- **Performance Optimization**: Caching, batch processing, and lazy loading | |
### **Modern ML Engineering** | |
- **Modular Architecture**: 6-component system with clear interfaces | |
- **Cloud-First Design**: HuggingFace Spaces optimized deployment | |
- **Comprehensive Testing**: Multiple test configurations and validation | |
- **Developer Experience**: Easy setup with multiple deployment options | |
## π Acknowledgments | |
- **Open Source Libraries**: Built on PyTorch, HuggingFace, FAISS, and spaCy | |
- **Transformer Models**: Leveraging state-of-the-art sentence transformers | |
- **Cloud Platforms**: Optimized for HuggingFace Spaces deployment | |
- **RISC-V Community**: Focus on technical documentation use case | |
--- | |
## π Quick Start Summary | |
**HuggingFace Spaces (Recommended)**: Upload `app.py`, set `HF_TOKEN`, deploy | |
**Local Development**: `pip install -r requirements.txt`, `ollama pull llama3.2:3b`, `streamlit run app.py` | |
**Advanced Features**: Neural reranking, graph enhancement, and multi-backend support | |