--- title: README emoji: 😻 colorFrom: indigo colorTo: indigo sdk: static pinned: false --- # EvoEval: Evolving Coding Benchmarks via LLM **EvoEval**1 is a holistic benchmark suite created by _evolving_ **HumanEval** problems: - 🔥 Contains **828** new problems across **5** 🌠 semantic-altering and **2** ⭐ semantic-preserving benchmarks - 🔮 Allows evaluation/comparison across different **dimensions** and problem **types** (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [**visualization tool**](https://evo-eval.github.io/visualization.html) for ready-to-use comparison - 🏆 Complete with [**leaderboard**](https://evo-eval.github.io/leaderboard.html), **groundtruth solutions**, **robust testcases** and **evaluation scripts** to easily fit into your evaluation pipeline - 🤖 Generated LLM code samples from **>50** different models to save you time in running experiments 1 coincidentally similar pronunciation with 😈 EvilEval - GitHub: [evo-eval/evoeval](https://github.com/evo-eval/evoeval) - Webpage: [evo-eval.github.io](https://evo-eval.github.io/) - Leaderboard: [evo-eval.github.io/leaderboard.html](https://evo-eval.github.io/leaderboard.html) - Visualization: [evo-eval.github.io/visualization.html](https://evo-eval.github.io/visualization.html) - Paper: [arXiv](https://arxiv.org/abs/2403.19114) - PyPI: [evoeval](https://pypi.org/project/evoeval/)