Spaces:

ArthyP
/

enhanced-rag-demo

Running

File size: 3,585 Bytes

5e1a30c
1cdeab3
5e1a30c
 
 
 
 
 
 
 
 
 
 
 
 
1cdeab3
5e1a30c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1cdeab3
5e1a30c
1cdeab3
5e1a30c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1cdeab3
 
 
5e1a30c
1cdeab3
5e1a30c
 
 
 
1cdeab3
 
 
 
5e1a30c
1cdeab3

# HuggingFace Spaces Deployment Guide
## Enhanced RISC-V RAG System

### 🚀 Quick Deployment Steps

1. **Create HuggingFace Space**
   - Go to [HuggingFace Spaces](https://huggingface.co/spaces)
   - Click "Create new Space"
   - Choose **Streamlit** as SDK
   - Set hardware to **CPU Basic** (2 cores, 16GB RAM)

2. **Upload Files**
   Upload all files from this directory to your space:
   ```
   app.py                    # Main entry point
   streamlit_epic2_demo.py   # Enhanced RAG demo
   requirements.txt          # Dependencies
   config/                   # Configuration files
   src/                      # Core system
   data/                     # Sample documents
   demo/                     # Demo utilities
   ```

3. **Set Environment Variables** (Optional)
   In your Space settings, add:
   ```
   HF_TOKEN=your_huggingface_token_here
   ```
   
   **Note**: The system works without HF_TOKEN but provides enhanced capabilities with it.

4. **Build & Deploy**
   - HuggingFace Spaces will automatically build your app
   - Monitor build logs for any issues
   - App will be available at: `https://huggingface.co/spaces/your-username/your-space-name`

### 🔧 System Capabilities

#### **With HF_TOKEN (Recommended)**
- ✅ Full advanced RAG capabilities
- ✅ Neural reranking with cross-encoder models
- ✅ Graph enhancement for document relationships
- ✅ Real-time analytics and performance monitoring
- ✅ API-based LLM integration (memory efficient)

#### **Without HF_TOKEN (Demo Mode)**
- ✅ System architecture demonstration
- ✅ Performance metrics display
- ✅ Technical documentation showcase
- ℹ️ Limited live query functionality

### 📊 Performance Expectations

**Memory Usage**: < 16GB (HF Spaces compatible)
**Startup Time**: 30-60 seconds (model loading)
**Query Response**: 1-3 seconds per query
**Concurrent Users**: Supports multiple simultaneous users

### 🔍 Monitoring & Troubleshooting

#### **Common Issues**

1. **Build Fails**
   - Check `requirements.txt` compatibility
   - Ensure all files are uploaded
   - Monitor build logs for specific errors

2. **High Memory Usage**
   - System is optimized for <16GB usage
   - Models load efficiently with lazy loading
   - Consider upgrading to CPU Persistent if needed

3. **Slow Response Times**
   - First query may be slower (model loading)
   - Subsequent queries should be <3 seconds
   - Check HF_TOKEN configuration for API access

#### **Health Check Endpoints**

The system provides built-in health monitoring:
- Automatic environment detection
- Configuration validation
- Component status reporting

### 💡 Tips for Best Performance

1. **Use HF_TOKEN**: Enables full capabilities and better performance
2. **Monitor Logs**: Check for initialization and query processing
3. **Sample Queries**: Use provided RISC-V technical queries for demo
4. **Configuration**: System auto-selects optimal configuration based on environment

### 📈 Expected Demo Results

With proper setup, your demo will showcase:
- **Neural reranking** with cross-encoder models
- **Graph enhancement** for document relationships
- **Hybrid search** combining semantic and keyword matching
- **Real-time analytics** with performance metrics
- **Professional UI** with technical feature focus

### 🎯 Portfolio Impact

This deployment demonstrates:
- Advanced RAG system implementation
- Modular 6-component architecture
- Neural reranking and graph enhancement techniques
- Modern ML engineering practices

Showcases technical RAG implementation skills with focus on advanced features.