Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.42.0
metadata
title: Grok-4 GPQA Evaluation
emoji: π§
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.31.0
app_file: run_hf_space.py
pinned: false
Grok-4 GPQA Evaluation Dashboard
Real-time evaluation of Grok-4 model on GPQA benchmark.
βοΈ Configuration Required
Please set these secrets in your Space settings:
- GROK_API_KEY: Your Grok API key from x.ai
- HF_TOKEN: Your Hugging Face token (for GPQA dataset access)
π Features
- Real-time progress tracking
- Accuracy metrics and performance stats
- Detailed results export
- Support for full GPQA dataset (448 questions)
π Getting Started
- Set the required secrets in Space settings
- Make sure you have GPQA dataset access
- The evaluation will start automatically
- Monitor progress in the dashboard