Spaces:

ArthyP
/

enhanced-rag-demo

Running

App Files Files Community

enhanced-rag-demo / DEPLOYMENT_GUIDE.md

Arthur Passuello

Cleaned up displayed content

1cdeab3 about 1 month ago

preview code

raw

history blame contribute delete

3.59 kB

	# HuggingFace Spaces Deployment Guide
	## Enhanced RISC-V RAG System

	### 🚀 Quick Deployment Steps

	1. Create HuggingFace Space
	- Go to [HuggingFace Spaces](https://huggingface.co/spaces)
	- Click "Create new Space"
	- Choose Streamlit as SDK
	- Set hardware to CPU Basic (2 cores, 16GB RAM)

	2. Upload Files
	Upload all files from this directory to your space:
	```
	app.py # Main entry point
	streamlit_epic2_demo.py # Enhanced RAG demo
	requirements.txt # Dependencies
	config/ # Configuration files
	src/ # Core system
	data/ # Sample documents
	demo/ # Demo utilities
	```

	3. Set Environment Variables (Optional)
	In your Space settings, add:
	```
	HF_TOKEN=your_huggingface_token_here
	```

	Note: The system works without HF_TOKEN but provides enhanced capabilities with it.

	4. Build & Deploy
	- HuggingFace Spaces will automatically build your app
	- Monitor build logs for any issues
	- App will be available at: `https://huggingface.co/spaces/your-username/your-space-name`

	### 🔧 System Capabilities

	#### With HF_TOKEN (Recommended)
	- ✅ Full advanced RAG capabilities
	- ✅ Neural reranking with cross-encoder models
	- ✅ Graph enhancement for document relationships
	- ✅ Real-time analytics and performance monitoring
	- ✅ API-based LLM integration (memory efficient)

	#### Without HF_TOKEN (Demo Mode)
	- ✅ System architecture demonstration
	- ✅ Performance metrics display
	- ✅ Technical documentation showcase
	- ℹ️ Limited live query functionality

	### 📊 Performance Expectations

	Memory Usage: < 16GB (HF Spaces compatible)
	Startup Time: 30-60 seconds (model loading)
	Query Response: 1-3 seconds per query
	Concurrent Users: Supports multiple simultaneous users

	### 🔍 Monitoring & Troubleshooting

	#### Common Issues

	1. Build Fails
	- Check `requirements.txt` compatibility
	- Ensure all files are uploaded
	- Monitor build logs for specific errors

	2. High Memory Usage
	- System is optimized for <16GB usage
	- Models load efficiently with lazy loading
	- Consider upgrading to CPU Persistent if needed

	3. Slow Response Times
	- First query may be slower (model loading)
	- Subsequent queries should be <3 seconds
	- Check HF_TOKEN configuration for API access

	#### Health Check Endpoints

	The system provides built-in health monitoring:
	- Automatic environment detection
	- Configuration validation
	- Component status reporting

	### 💡 Tips for Best Performance

	1. Use HF_TOKEN: Enables full capabilities and better performance
	2. Monitor Logs: Check for initialization and query processing
	3. Sample Queries: Use provided RISC-V technical queries for demo
	4. Configuration: System auto-selects optimal configuration based on environment

	### 📈 Expected Demo Results

	With proper setup, your demo will showcase:
	- Neural reranking with cross-encoder models
	- Graph enhancement for document relationships
	- Hybrid search combining semantic and keyword matching
	- Real-time analytics with performance metrics
	- Professional UI with technical feature focus

	### 🎯 Portfolio Impact

	This deployment demonstrates:
	- Advanced RAG system implementation
	- Modular 6-component architecture
	- Neural reranking and graph enhancement techniques
	- Modern ML engineering practices

	Showcases technical RAG implementation skills with focus on advanced features.