FutureBench / README.md
vinid's picture
Leaderboard deployment 2025-07-16 18:05:41
6441bc6

A newer version of the Gradio SDK is available: 5.38.2

Upgrade
metadata
title: FutureBench Leaderboard
emoji: ๐Ÿ”ฎ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

FutureBench Leaderboard App

A minimal Gradio application for viewing FutureBench prediction data. This app downloads datasets from HuggingFace on startup and provides a web interface to explore the data.

Features

  • ๐Ÿ“Š Data Summary: View dataset statistics and information
  • ๐Ÿ” Sample Data: Browse sample prediction records
  • ๐Ÿ“‹ About: Learn about the FutureBench system
  • ๐Ÿ”„ Auto-refresh: Download latest data on startup
  • ๐Ÿ“… Date Range Slider: Filter the leaderboard by a custom date span

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. (Optional) Set your HuggingFace token for private repositories:
export HF_TOKEN=your_token_here

Running the App

Launch the Gradio application:

python app.py

The app will:

  1. Download datasets from HuggingFace repositories on startup
  2. Process the data and create summaries
  3. Launch a web interface at http://localhost:7860

Data Sources

The app downloads data from these HuggingFace repositories:

  • futurebench/requests - Evaluation queue
  • futurebench/results - Evaluation results
  • futurebench/data - Main prediction dataset

Structure

  • app.py - Main Gradio application
  • process_data/ - Data processing utilities
  • requirements.txt - Python dependencies
  • README.md - This file

Next Steps

This is a minimal version focusing on data download and display. Future enhancements will include:

  • Full leaderboard with model rankings
  • Interactive filtering and sorting
  • Detailed performance metrics
  • Model comparison tools