grok4-gpqa-eval / README.md
TeddyYao's picture
Upload 38 files
8474f02 verified

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
title: Grok-4 GPQA Evaluation
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.31.0
app_file: run_hf_space.py
pinned: false

Grok-4 GPQA Evaluation Dashboard

Real-time evaluation of Grok-4 model on GPQA benchmark.

βš™οΈ Configuration Required

Please set these secrets in your Space settings:

  • GROK_API_KEY: Your Grok API key from x.ai
  • HF_TOKEN: Your Hugging Face token (for GPQA dataset access)

πŸ“Š Features

  • Real-time progress tracking
  • Accuracy metrics and performance stats
  • Detailed results export
  • Support for full GPQA dataset (448 questions)

πŸš€ Getting Started

  1. Set the required secrets in Space settings
  2. Make sure you have GPQA dataset access
  3. The evaluation will start automatically
  4. Monitor progress in the dashboard