Spaces:
Running
Running
# HuggingFace Spaces Deployment Guide | |
## Enhanced RISC-V RAG System | |
### π Quick Deployment Steps | |
1. **Create HuggingFace Space** | |
- Go to [HuggingFace Spaces](https://huggingface.co/spaces) | |
- Click "Create new Space" | |
- Choose **Streamlit** as SDK | |
- Set hardware to **CPU Basic** (2 cores, 16GB RAM) | |
2. **Upload Files** | |
Upload all files from this directory to your space: | |
``` | |
app.py # Main entry point | |
streamlit_epic2_demo.py # Enhanced RAG demo | |
requirements.txt # Dependencies | |
config/ # Configuration files | |
src/ # Core system | |
data/ # Sample documents | |
demo/ # Demo utilities | |
``` | |
3. **Set Environment Variables** (Optional) | |
In your Space settings, add: | |
``` | |
HF_TOKEN=your_huggingface_token_here | |
``` | |
**Note**: The system works without HF_TOKEN but provides enhanced capabilities with it. | |
4. **Build & Deploy** | |
- HuggingFace Spaces will automatically build your app | |
- Monitor build logs for any issues | |
- App will be available at: `https://huggingface.co/spaces/your-username/your-space-name` | |
### π§ System Capabilities | |
#### **With HF_TOKEN (Recommended)** | |
- β Full advanced RAG capabilities | |
- β Neural reranking with cross-encoder models | |
- β Graph enhancement for document relationships | |
- β Real-time analytics and performance monitoring | |
- β API-based LLM integration (memory efficient) | |
#### **Without HF_TOKEN (Demo Mode)** | |
- β System architecture demonstration | |
- β Performance metrics display | |
- β Technical documentation showcase | |
- βΉοΈ Limited live query functionality | |
### π Performance Expectations | |
**Memory Usage**: < 16GB (HF Spaces compatible) | |
**Startup Time**: 30-60 seconds (model loading) | |
**Query Response**: 1-3 seconds per query | |
**Concurrent Users**: Supports multiple simultaneous users | |
### π Monitoring & Troubleshooting | |
#### **Common Issues** | |
1. **Build Fails** | |
- Check `requirements.txt` compatibility | |
- Ensure all files are uploaded | |
- Monitor build logs for specific errors | |
2. **High Memory Usage** | |
- System is optimized for <16GB usage | |
- Models load efficiently with lazy loading | |
- Consider upgrading to CPU Persistent if needed | |
3. **Slow Response Times** | |
- First query may be slower (model loading) | |
- Subsequent queries should be <3 seconds | |
- Check HF_TOKEN configuration for API access | |
#### **Health Check Endpoints** | |
The system provides built-in health monitoring: | |
- Automatic environment detection | |
- Configuration validation | |
- Component status reporting | |
### π‘ Tips for Best Performance | |
1. **Use HF_TOKEN**: Enables full capabilities and better performance | |
2. **Monitor Logs**: Check for initialization and query processing | |
3. **Sample Queries**: Use provided RISC-V technical queries for demo | |
4. **Configuration**: System auto-selects optimal configuration based on environment | |
### π Expected Demo Results | |
With proper setup, your demo will showcase: | |
- **Neural reranking** with cross-encoder models | |
- **Graph enhancement** for document relationships | |
- **Hybrid search** combining semantic and keyword matching | |
- **Real-time analytics** with performance metrics | |
- **Professional UI** with technical feature focus | |
### π― Portfolio Impact | |
This deployment demonstrates: | |
- Advanced RAG system implementation | |
- Modular 6-component architecture | |
- Neural reranking and graph enhancement techniques | |
- Modern ML engineering practices | |
Showcases technical RAG implementation skills with focus on advanced features. |