Spaces:
Running
Running
title: Grok-4 GPQA Evaluation | |
emoji: π§ | |
colorFrom: blue | |
colorTo: green | |
sdk: gradio | |
sdk_version: "4.31.0" | |
app_file: run_hf_space.py | |
pinned: false | |
# Grok-4 GPQA Evaluation Dashboard | |
Real-time evaluation of Grok-4 model on GPQA benchmark. | |
## βοΈ Configuration Required | |
Please set these secrets in your Space settings: | |
- **GROK_API_KEY**: Your Grok API key from x.ai | |
- **HF_TOKEN**: Your Hugging Face token (for GPQA dataset access) | |
## π Features | |
- Real-time progress tracking | |
- Accuracy metrics and performance stats | |
- Detailed results export | |
- Support for full GPQA dataset (448 questions) | |
## π Getting Started | |
1. Set the required secrets in Space settings | |
2. Make sure you have GPQA dataset access | |
3. The evaluation will start automatically | |
4. Monitor progress in the dashboard |