File size: 803 Bytes
cd16551
8474f02
 
 
 
cd16551
8474f02
 
cd16551
 
 
8474f02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
title: Grok-4 GPQA Evaluation
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "4.31.0"
app_file: run_hf_space.py
pinned: false
---

# Grok-4 GPQA Evaluation Dashboard

Real-time evaluation of Grok-4 model on GPQA benchmark.

## ⚙️ Configuration Required

Please set these secrets in your Space settings:
- **GROK_API_KEY**: Your Grok API key from x.ai
- **HF_TOKEN**: Your Hugging Face token (for GPQA dataset access)

## 📊 Features

- Real-time progress tracking
- Accuracy metrics and performance stats
- Detailed results export
- Support for full GPQA dataset (448 questions)

## 🚀 Getting Started

1. Set the required secrets in Space settings
2. Make sure you have GPQA dataset access
3. The evaluation will start automatically
4. Monitor progress in the dashboard