---
title: Grok-4 GPQA Evaluation
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "4.31.0"
app_file: run_hf_space.py
pinned: false
---

# Grok-4 GPQA Evaluation Dashboard

Real-time evaluation of Grok-4 model on GPQA benchmark.

## ⚙️ Configuration Required

Please set these secrets in your Space settings:
- **GROK_API_KEY**: Your Grok API key from x.ai
- **HF_TOKEN**: Your Hugging Face token (for GPQA dataset access)

## 📊 Features

- Real-time progress tracking
- Accuracy metrics and performance stats
- Detailed results export
- Support for full GPQA dataset (448 questions)

## 🚀 Getting Started

1. Set the required secrets in Space settings
2. Make sure you have GPQA dataset access
3. The evaluation will start automatically
4. Monitor progress in the dashboard