OpenEvals

community

Activity Feed

AI & ML interests

LLM evaluation

Recent Activity

SaylorTwift updated a Space 22 days ago

OpenEvals/open_benchmark_index

SaylorTwift new activity about 2 months ago

OpenEvals/MuSR:[bot] Conversion to Parquet

SaylorTwift updated a dataset about 2 months ago

OpenEvals/MuSR

View all activity

Organization Card

Community About org cards

Hi! Welcome on the org page of the Evaluation team at HuggingFace. We want to support the community in building and sharing quality evaluations, for reproducible and fair model comparisions, to cut through the hype of releases and better understand actual model capabilities.

We're behind the:

evaluation guidebook, your reference for LLM evals
lighteval LLM evaluation suite, fast and filled with the SOTA benchmarks you might want
leaderboards on the hub initiative, to encourage people to build more leaderboards in the open for more reproducible evaluation. You'll find some doc here to build your own, and you can look for the best leaderboard for your use case here!

Our archived projects: