enhanced-rag-demo / DEPLOYMENT_GUIDE.md
Arthur Passuello
Cleaned up displayed content
1cdeab3

A newer version of the Streamlit SDK is available: 1.49.1

Upgrade

HuggingFace Spaces Deployment Guide

Enhanced RISC-V RAG System

πŸš€ Quick Deployment Steps

  1. Create HuggingFace Space

    • Go to HuggingFace Spaces
    • Click "Create new Space"
    • Choose Streamlit as SDK
    • Set hardware to CPU Basic (2 cores, 16GB RAM)
  2. Upload Files Upload all files from this directory to your space:

    app.py                    # Main entry point
    streamlit_epic2_demo.py   # Enhanced RAG demo
    requirements.txt          # Dependencies
    config/                   # Configuration files
    src/                      # Core system
    data/                     # Sample documents
    demo/                     # Demo utilities
    
  3. Set Environment Variables (Optional) In your Space settings, add:

    HF_TOKEN=your_huggingface_token_here
    

    Note: The system works without HF_TOKEN but provides enhanced capabilities with it.

  4. Build & Deploy

    • HuggingFace Spaces will automatically build your app
    • Monitor build logs for any issues
    • App will be available at: https://huggingface.co/spaces/your-username/your-space-name

πŸ”§ System Capabilities

With HF_TOKEN (Recommended)

  • βœ… Full advanced RAG capabilities
  • βœ… Neural reranking with cross-encoder models
  • βœ… Graph enhancement for document relationships
  • βœ… Real-time analytics and performance monitoring
  • βœ… API-based LLM integration (memory efficient)

Without HF_TOKEN (Demo Mode)

  • βœ… System architecture demonstration
  • βœ… Performance metrics display
  • βœ… Technical documentation showcase
  • ℹ️ Limited live query functionality

πŸ“Š Performance Expectations

Memory Usage: < 16GB (HF Spaces compatible) Startup Time: 30-60 seconds (model loading) Query Response: 1-3 seconds per query Concurrent Users: Supports multiple simultaneous users

πŸ” Monitoring & Troubleshooting

Common Issues

  1. Build Fails

    • Check requirements.txt compatibility
    • Ensure all files are uploaded
    • Monitor build logs for specific errors
  2. High Memory Usage

    • System is optimized for <16GB usage
    • Models load efficiently with lazy loading
    • Consider upgrading to CPU Persistent if needed
  3. Slow Response Times

    • First query may be slower (model loading)
    • Subsequent queries should be <3 seconds
    • Check HF_TOKEN configuration for API access

Health Check Endpoints

The system provides built-in health monitoring:

  • Automatic environment detection
  • Configuration validation
  • Component status reporting

πŸ’‘ Tips for Best Performance

  1. Use HF_TOKEN: Enables full capabilities and better performance
  2. Monitor Logs: Check for initialization and query processing
  3. Sample Queries: Use provided RISC-V technical queries for demo
  4. Configuration: System auto-selects optimal configuration based on environment

πŸ“ˆ Expected Demo Results

With proper setup, your demo will showcase:

  • Neural reranking with cross-encoder models
  • Graph enhancement for document relationships
  • Hybrid search combining semantic and keyword matching
  • Real-time analytics with performance metrics
  • Professional UI with technical feature focus

🎯 Portfolio Impact

This deployment demonstrates:

  • Advanced RAG system implementation
  • Modular 6-component architecture
  • Neural reranking and graph enhancement techniques
  • Modern ML engineering practices

Showcases technical RAG implementation skills with focus on advanced features.