Spaces:

togethercomputer
/

FutureBench

Running

App Files Files Community

FutureBench / src /about.py

vinid

Leaderboard deployment 2025-07-16 18:05:41

6441bc6 15 days ago

raw

history blame contribute delete

2.67 kB

	from dataclasses import dataclass
	from enum import Enum


	@dataclass
	class Task:
	benchmark: str
	metric: str
	col_name: str


	# Define our evaluation tasks
	# ---------------------------------------------------
	class Tasks(Enum):
	# task_key in the data, metric name, display name
	news = Task("news", "acc", "News")
	polymarket = Task("polymarket", "acc", "PolyMarket")


	# Your leaderboard name
	TITLE = """<h1 align="center" id="space-title" style="font-size: 4.375rem; font-weight: bold; margin-bottom: 1rem;">🔮 FutureBench Leaderboard</h1>"""

	# What does your leaderboard evaluate?
	INTRODUCTION_TEXT = """<div class="section-card">
	<h3 class="section-header"><span class="section-icon">🎯</span> About FutureBench</h3>
	FutureBench is a benchmarking system for evaluating AI models on predicting future events.
	This leaderboard shows how well different AI models perform at forecasting real-world outcomes
	across various domains including news events, sports, and prediction markets.
	<br><br>
	📝 <a href="https://www.together.ai/blog/futurebench" target="_blank" style="color: #007acc; text-decoration: none;">Read our blog post</a> for more details about FutureBench.
	</div>"""

	# Additional information about the benchmark
	ABOUT_TEXT = """
	<div class="section-card fade-in-up">
	<h2 class="section-header"><span class="section-icon">⚙️</span> How it works</h2>

	FutureBench evaluates AI models on their ability to predict future events by:

	- Ingesting real-world events from multiple sources (news, sports, prediction markets)
	- Collecting AI predictions before events resolve
	- Measuring accuracy once outcomes are known
	- Ranking models based on their predictive performance
	</div>

	<div class="section-card fade-in-up stagger-1">
	<h2 class="section-header"><span class="section-icon">📊</span> Event Types</h2>

	- News Events: Predictions about political developments, economic changes, and current events
	- PolyMarket: Predictions on various real-world events traded on prediction markets
	</div>

	<div class="section-card fade-in-up stagger-2">
	<h2 class="section-header"><span class="section-icon">📈</span> Metrics</h2>

	Models are evaluated using accuracy - the percentage of correct predictions made.
	The Average score shows overall performance across all event types.
	</div>

	<div class="section-card fade-in-up stagger-3">
	<h2 class="section-header"><span class="section-icon">🔒</span> Data Integrity</h2>

	All predictions are made before events resolve, ensuring fair evaluation.
	The leaderboard updates as new events are resolved and model performances are calculated.
	</div>
	"""