--- title: Grok-4 GPQA Evaluation emoji: 🧠 colorFrom: blue colorTo: green sdk: gradio sdk_version: "4.31.0" app_file: run_hf_space.py pinned: false --- # Grok-4 GPQA Evaluation Dashboard Real-time evaluation of Grok-4 model on GPQA benchmark. ## ⚙️ Configuration Required Please set these secrets in your Space settings: - **GROK_API_KEY**: Your Grok API key from x.ai - **HF_TOKEN**: Your Hugging Face token (for GPQA dataset access) ## 📊 Features - Real-time progress tracking - Accuracy metrics and performance stats - Detailed results export - Support for full GPQA dataset (448 questions) ## 🚀 Getting Started 1. Set the required secrets in Space settings 2. Make sure you have GPQA dataset access 3. The evaluation will start automatically 4. Monitor progress in the dashboard