BenchHub

non-profit

AI & ML interests

None defined yet.

Recent Activity

amphora submitted a paper 4 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

EunsuKim updated a dataset 9 days ago

BenchHub/BenchHub-Ko

amphora submitted a paper 3 months ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

View all activity

BenchHub 's Spaces 1

BenchHub

Customize and evaluate LLMs using BenchHub