enhanced-rag-demo / DEPLOYMENT_GUIDE.md
Arthur Passuello
Cleaned up displayed content
1cdeab3
# HuggingFace Spaces Deployment Guide
## Enhanced RISC-V RAG System
### πŸš€ Quick Deployment Steps
1. **Create HuggingFace Space**
- Go to [HuggingFace Spaces](https://huggingface.co/spaces)
- Click "Create new Space"
- Choose **Streamlit** as SDK
- Set hardware to **CPU Basic** (2 cores, 16GB RAM)
2. **Upload Files**
Upload all files from this directory to your space:
```
app.py # Main entry point
streamlit_epic2_demo.py # Enhanced RAG demo
requirements.txt # Dependencies
config/ # Configuration files
src/ # Core system
data/ # Sample documents
demo/ # Demo utilities
```
3. **Set Environment Variables** (Optional)
In your Space settings, add:
```
HF_TOKEN=your_huggingface_token_here
```
**Note**: The system works without HF_TOKEN but provides enhanced capabilities with it.
4. **Build & Deploy**
- HuggingFace Spaces will automatically build your app
- Monitor build logs for any issues
- App will be available at: `https://huggingface.co/spaces/your-username/your-space-name`
### πŸ”§ System Capabilities
#### **With HF_TOKEN (Recommended)**
- βœ… Full advanced RAG capabilities
- βœ… Neural reranking with cross-encoder models
- βœ… Graph enhancement for document relationships
- βœ… Real-time analytics and performance monitoring
- βœ… API-based LLM integration (memory efficient)
#### **Without HF_TOKEN (Demo Mode)**
- βœ… System architecture demonstration
- βœ… Performance metrics display
- βœ… Technical documentation showcase
- ℹ️ Limited live query functionality
### πŸ“Š Performance Expectations
**Memory Usage**: < 16GB (HF Spaces compatible)
**Startup Time**: 30-60 seconds (model loading)
**Query Response**: 1-3 seconds per query
**Concurrent Users**: Supports multiple simultaneous users
### πŸ” Monitoring & Troubleshooting
#### **Common Issues**
1. **Build Fails**
- Check `requirements.txt` compatibility
- Ensure all files are uploaded
- Monitor build logs for specific errors
2. **High Memory Usage**
- System is optimized for <16GB usage
- Models load efficiently with lazy loading
- Consider upgrading to CPU Persistent if needed
3. **Slow Response Times**
- First query may be slower (model loading)
- Subsequent queries should be <3 seconds
- Check HF_TOKEN configuration for API access
#### **Health Check Endpoints**
The system provides built-in health monitoring:
- Automatic environment detection
- Configuration validation
- Component status reporting
### πŸ’‘ Tips for Best Performance
1. **Use HF_TOKEN**: Enables full capabilities and better performance
2. **Monitor Logs**: Check for initialization and query processing
3. **Sample Queries**: Use provided RISC-V technical queries for demo
4. **Configuration**: System auto-selects optimal configuration based on environment
### πŸ“ˆ Expected Demo Results
With proper setup, your demo will showcase:
- **Neural reranking** with cross-encoder models
- **Graph enhancement** for document relationships
- **Hybrid search** combining semantic and keyword matching
- **Real-time analytics** with performance metrics
- **Professional UI** with technical feature focus
### 🎯 Portfolio Impact
This deployment demonstrates:
- Advanced RAG system implementation
- Modular 6-component architecture
- Neural reranking and graph enhancement techniques
- Modern ML engineering practices
Showcases technical RAG implementation skills with focus on advanced features.