File size: 1,744 Bytes
6441bc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: FutureBench Leaderboard
emoji: ๐Ÿ”ฎ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
---

# FutureBench Leaderboard App

A minimal Gradio application for viewing FutureBench prediction data. This app downloads datasets from HuggingFace on startup and provides a web interface to explore the data.

## Features

- ๐Ÿ“Š **Data Summary**: View dataset statistics and information
- ๐Ÿ” **Sample Data**: Browse sample prediction records
- ๐Ÿ“‹ **About**: Learn about the FutureBench system
- ๐Ÿ”„ **Auto-refresh**: Download latest data on startup
- ๐Ÿ“… **Date Range Slider**: Filter the leaderboard by a custom date span

## Setup

1. Install dependencies:
```bash
pip install -r requirements.txt
```

2. (Optional) Set your HuggingFace token for private repositories:
```bash
export HF_TOKEN=your_token_here
```

## Running the App

Launch the Gradio application:

```bash
python app.py
```

The app will:
1. Download datasets from HuggingFace repositories on startup
2. Process the data and create summaries
3. Launch a web interface at `http://localhost:7860`

## Data Sources

The app downloads data from these HuggingFace repositories:
- `futurebench/requests` - Evaluation queue
- `futurebench/results` - Evaluation results
- `futurebench/data` - Main prediction dataset

## Structure

- `app.py` - Main Gradio application
- `process_data/` - Data processing utilities
- `requirements.txt` - Python dependencies
- `README.md` - This file

## Next Steps

This is a minimal version focusing on data download and display. Future enhancements will include:
- Full leaderboard with model rankings
- Interactive filtering and sorting
- Detailed performance metrics
- Model comparison tools