HuggingFace Spaces Deployment Guide

Enhanced RISC-V RAG System

🚀 Quick Deployment Steps

Create HuggingFace Space
- Go to HuggingFace Spaces
- Click "Create new Space"
- Choose Streamlit as SDK
- Set hardware to CPU Basic (2 cores, 16GB RAM)

Upload Files Upload all files from this directory to your space:

app.py                    # Main entry point
streamlit_epic2_demo.py   # Enhanced RAG demo
requirements.txt          # Dependencies
config/                   # Configuration files
src/                      # Core system
data/                     # Sample documents
demo/                     # Demo utilities

Set Environment Variables (Optional) In your Space settings, add:
```
HF_TOKEN=your_huggingface_token_here
```
Note: The system works without HF_TOKEN but provides enhanced capabilities with it.
Build & Deploy
- HuggingFace Spaces will automatically build your app
- Monitor build logs for any issues
- App will be available at: https://huggingface.co/spaces/your-username/your-space-name

🔧 System Capabilities

With HF_TOKEN (Recommended)

✅ Full advanced RAG capabilities
✅ Neural reranking with cross-encoder models
✅ Graph enhancement for document relationships
✅ Real-time analytics and performance monitoring
✅ API-based LLM integration (memory efficient)

Without HF_TOKEN (Demo Mode)

✅ System architecture demonstration
✅ Performance metrics display
✅ Technical documentation showcase
ℹ️ Limited live query functionality

📊 Performance Expectations

Memory Usage: < 16GB (HF Spaces compatible) Startup Time: 30-60 seconds (model loading) Query Response: 1-3 seconds per query Concurrent Users: Supports multiple simultaneous users

🔍 Monitoring & Troubleshooting

Common Issues

Build Fails
- Check requirements.txt compatibility
- Ensure all files are uploaded
- Monitor build logs for specific errors
High Memory Usage
- System is optimized for <16GB usage
- Models load efficiently with lazy loading
- Consider upgrading to CPU Persistent if needed
Slow Response Times
- First query may be slower (model loading)
- Subsequent queries should be <3 seconds
- Check HF_TOKEN configuration for API access

Health Check Endpoints

The system provides built-in health monitoring:

Automatic environment detection
Configuration validation
Component status reporting

💡 Tips for Best Performance

Use HF_TOKEN: Enables full capabilities and better performance
Monitor Logs: Check for initialization and query processing
Sample Queries: Use provided RISC-V technical queries for demo
Configuration: System auto-selects optimal configuration based on environment

📈 Expected Demo Results

With proper setup, your demo will showcase:

Neural reranking with cross-encoder models
Graph enhancement for document relationships
Hybrid search combining semantic and keyword matching
Real-time analytics with performance metrics
Professional UI with technical feature focus

🎯 Portfolio Impact

This deployment demonstrates:

Advanced RAG system implementation
Modular 6-component architecture
Neural reranking and graph enhancement techniques
Modern ML engineering practices

Showcases technical RAG implementation skills with focus on advanced features.