Spaces:
Running
Running
File size: 803 Bytes
cd16551 8474f02 cd16551 8474f02 cd16551 8474f02 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
title: Grok-4 GPQA Evaluation
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "4.31.0"
app_file: run_hf_space.py
pinned: false
---
# Grok-4 GPQA Evaluation Dashboard
Real-time evaluation of Grok-4 model on GPQA benchmark.
## ⚙️ Configuration Required
Please set these secrets in your Space settings:
- **GROK_API_KEY**: Your Grok API key from x.ai
- **HF_TOKEN**: Your Hugging Face token (for GPQA dataset access)
## 📊 Features
- Real-time progress tracking
- Accuracy metrics and performance stats
- Detailed results export
- Support for full GPQA dataset (448 questions)
## 🚀 Getting Started
1. Set the required secrets in Space settings
2. Make sure you have GPQA dataset access
3. The evaluation will start automatically
4. Monitor progress in the dashboard |