vinid's picture
Leaderboard deployment 2025-07-16 18:05:41
6441bc6

A newer version of the Gradio SDK is available: 5.39.0

Upgrade

FutureBench Dataset Processing

This directory contains tools for processing FutureBench datasets, both downloading from HuggingFace and transforming your own database into the standard format.

Option 1: Download from HuggingFace (Original)

Use this to download the existing FutureBench dataset:

python download_data.py

Option 2: Transform Your Own Database

Use this to transform your production database into HuggingFace format:

Setup

  1. Install dependencies:
pip install pandas sqlalchemy huggingface_hub
  1. Set up HuggingFace token:
export HF_TOKEN="your_huggingface_token_here"
  1. Configure your settings: Edit config_db.py to match your needs:
  • Update HF_CONFIG with your HuggingFace repository names
  • Adjust PROCESSING_CONFIG for data filtering preferences
  • Note: Database connection uses the same setup as the main FutureBench app

Usage

# Transform your database and upload to HuggingFace
python db_to_hf.py

# Or run locally without uploading
HF_TOKEN="" python db_to_hf.py

Database Schema

The script uses the same database schema as the main FutureBench application:

  • EventBase model for events
  • Prediction model for predictions
  • Uses SQLAlchemy ORM (same as convert_to_csv.py)

No additional database configuration needed - it uses the existing FutureBench database connection.

Output Format

The script produces data in the same format as the original FutureBench dataset:

  • event_id, question, event_type, algorithm_name, actual_prediction, result, open_to_bet_until, prediction_created_at

Automation

You can run this as a scheduled job:

# Add to crontab to run daily at 2 AM
0 2 * * * cd /path/to/your/project && python leaderboard/process_data/db_to_hf.py

Files

  • download_data.py - Downloads data from HuggingFace repositories
  • db_to_hf.py - Transforms your database to HuggingFace format
  • config_db.py - Configuration for database connection and HF settings
  • config.py - HuggingFace repository configuration
  • requirements.txt - Python dependencies

Data Structure

The main dataset contains:

  • event_id: Unique identifier for each event
  • question: The prediction question
  • event_type: Type of event (polymarket, soccer, etc.)
  • answer_options: Possible answers in JSON format
  • result: Actual outcome (if resolved)
  • algorithm_name: AI model that made the prediction
  • actual_prediction: The prediction made
  • open_to_bet_until: Prediction window deadline
  • prediction_created_at: When prediction was made

Output

The script generates:

  • Downloaded datasets in local cache folders
  • evaluation_queue.csv with unique events for processing
  • Console output with data statistics and summary