|
--- |
|
title: README |
|
emoji: ๐ป |
|
colorFrom: indigo |
|
colorTo: indigo |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
# EvoEval: Evolving Coding Benchmarks via LLM |
|
|
|
**EvoEval**<sup>1</sup> is a holistic benchmark suite created by _evolving_ **HumanEval** problems: |
|
- ๐ฅ Contains **828** new problems across **5** ๐ semantic-altering and **2** โญ semantic-preserving benchmarks |
|
- ๐ฎ Allows evaluation/comparison across different **dimensions** and problem **types** (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [**visualization tool**](https://evo-eval.github.io/visualization.html) for ready-to-use comparison |
|
- ๐ Complete with [**leaderboard**](https://evo-eval.github.io/leaderboard.html), **groundtruth solutions**, **robust testcases** and **evaluation scripts** to easily fit into your evaluation pipeline |
|
- ๐ค Generated LLM code samples from **>50** different models to save you time in running experiments |
|
|
|
<sup>1</sup> coincidentally similar pronunciation with ๐ EvilEval |
|
|
|
- GitHub: [evo-eval/evoeval](https://github.com/evo-eval/evoeval) |
|
- Webpage: [evo-eval.github.io](https://evo-eval.github.io/) |
|
- Leaderboard: [evo-eval.github.io/leaderboard.html](https://evo-eval.github.io/leaderboard.html) |
|
- Visualization: [evo-eval.github.io/visualization.html](https://evo-eval.github.io/visualization.html) |
|
- Paper: [arXiv](https://arxiv.org/abs/2403.19114) |
|
- PyPI: [evoeval](https://pypi.org/project/evoeval/) |
|
|