Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
File size: 3,897 Bytes
0061e14 7d20cd0 0061e14 7d20cd0 0061e14 7d20cd0 0061e14 7d20cd0 0061e14 31d76d1 5f0a178 31d76d1 5f0a178 31d76d1 8aa6e0d 31d76d1 0061e14 daa3ab0 3239883 333d45b 0061e14 9eba8d6 0135bb2 9eba8d6 0135bb2 9eba8d6 0061e14 9eba8d6 0061e14 9eba8d6 04c3caf 0061e14 0135bb2 04c3caf 0135bb2 04c3caf 5848897 0135bb2 9eba8d6 0061e14 9eba8d6 0061e14 9eba8d6 daa3ab0 0061e14 333d45b 0061e14 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
from dataclasses import dataclass
from enum import Enum
@dataclass
class Task:
benchmark: str
metric: str
col_name: str
# Select your tasks here
# ---------------------------------------------------
class Tasks(Enum):
# task_key in the json file, metric_key in the json file, name to display in the leaderboard
task0 = Task("FormulaOne", "success_rate", "Success Rate (%)")
# task1 = Task("logiqa", "acc_norm", "LogiQA")
NUM_FEWSHOT = 0 # Change with your few shot
# ---------------------------------------------------
# Your leaderboard name
# TITLE = """<h1 align="center" id="space-title">AAI FormulaOne Leaderboard</h1>"""
TITLE = """
<h1 id="space-title" style="
text-align: center;
font-family: 'Segoe UI', 'Helvetica Neue', sans-serif;
font-weight: 300;
letter-spacing: 0.05em;
color: white;
text-transform: none;
margin-top: 2rem;
font-size: 2.6rem;
">
FormulaOne Leaderboard
</h1>
"""
# What does your leaderboard evaluate?
INTRODUCTION_TEXT = """
Welcome to the official leaderboard for the paper:
**FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming** <br>
*Gal Beniamini, Yuval Dor, Alon Vinnikov, Shir Granot Peled, Or Weinstein, Or Sharir, Noam Wies, Tomer Nussbaum, Ido Ben Shaul, Tomer Zekharya, Yoav Levine, Shai Shalev-Shwartz, Amnon Shashua* <br>
**AAI, July 2025**
FormulaOne is a new benchmark designed to challenge frontier AI models. The benchmark is constructed from a vast and conceptually diverse family of dynamic programming problems derived from Monadic Second-Order (MSO) logic on graphs, a framework with profound connections to theoretical computer science.
"""
# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = f"""
## How it works
## Reproducibility
To reproduce our results, here is the commands you can run:
"""
EVALUATION_QUEUE_TEXT = """
## π§ͺ Submitting to the FormulaOne Leaderboard
This leaderboard evaluates systems on the FormulaOne core dataset. Submissions consist of a .jsonl file with solution code for each problem.
### π I. Format Your Submission File
Your submission must be a .jsonl file with one entry per problem:
```json
{"problem_id": "1", "solution": "<your Python code here>"}
{"problem_id": "2", "solution": "<your Python code here>"}
...
```
- problem_id: Must match the official list of FormulaOne core problems.
- solution: A Python code implementing the required callback functions.
π Full list of problem_ids:
View the [FormulaOne core dataset](https://github.com/double-ai/formulaone-dataset-release/dataset/formulaone) for the complete list of problem IDs.
β οΈ Validation Rules:
Submissions must:
- Contain exactly two columns: ["problem_id", "solution"]
- Include all required problems (no missing/unknown IDs)
- Provide solutions as Python strings
- Avoid duplicates
### π€ II. Submit via the UI below
- Upload your `.jsonl` file.
- Fill in the following fields:
- **System Name**
- **Organization**
- **System Type**
- Click **Submit**.
### β±οΈ After Submission
Submissions are validated and evaluated within ~24 hours. Results will appear on the leaderboard once processed.
"""
CITATION_BUTTON_LABEL = """π How to cite FormulaOne"""
CITATION_BUTTON_TEXT = r"""
@misc{beniamini2025formulaonemeasuringdepthalgorithmic,
title={FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming},
author={Gal Beniamini and Yuval Dor and Alon Vinnikov and Shir Granot Peled and Or Weinstein and Or Sharir and Noam Wies and Tomer Nussbaum and Ido Ben Shaul and Tomer Zekharya and Yoav Levine and Shai Shalev-Shwartz and Amnon Shashua},
year={2025},
eprint={2507.13337},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.13337},
}
"""
|