grok4-gpqa-eval / README.md
TeddyYao's picture
Upload 38 files
8474f02 verified
---
title: Grok-4 GPQA Evaluation
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "4.31.0"
app_file: run_hf_space.py
pinned: false
---
# Grok-4 GPQA Evaluation Dashboard
Real-time evaluation of Grok-4 model on GPQA benchmark.
## βš™οΈ Configuration Required
Please set these secrets in your Space settings:
- **GROK_API_KEY**: Your Grok API key from x.ai
- **HF_TOKEN**: Your Hugging Face token (for GPQA dataset access)
## πŸ“Š Features
- Real-time progress tracking
- Accuracy metrics and performance stats
- Detailed results export
- Support for full GPQA dataset (448 questions)
## πŸš€ Getting Started
1. Set the required secrets in Space settings
2. Make sure you have GPQA dataset access
3. The evaluation will start automatically
4. Monitor progress in the dashboard