File size: 1,427 Bytes
106df81
 
 
 
 
 
 
 
 
f8d1e21
 
 
f5d2106
f8d1e21
 
 
 
 
 
 
 
 
 
4b274ed
f8d1e21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
title: README
emoji: ๐Ÿ˜ป
colorFrom: indigo
colorTo: indigo
sdk: static
pinned: false
---

# EvoEval: Evolving Coding Benchmarks via LLM 

**EvoEval**<sup>1</sup> is a holistic benchmark suite created by _evolving_ **HumanEval** problems:
- ๐Ÿ”ฅ Contains **828** new problems across **5** ๐ŸŒ  semantic-altering and **2** โญ semantic-preserving benchmarks
- ๐Ÿ”ฎ Allows evaluation/comparison across different **dimensions** and problem **types** (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [**visualization tool**](https://evo-eval.github.io/visualization.html) for ready-to-use comparison
- ๐Ÿ† Complete with [**leaderboard**](https://evo-eval.github.io/leaderboard.html), **groundtruth solutions**, **robust testcases** and **evaluation scripts** to easily fit into your evaluation pipeline
- ๐Ÿค– Generated LLM code samples from **>50** different models to save you time in running experiments

<sup>1</sup> coincidentally similar pronunciation with ๐Ÿ˜ˆ EvilEval

- GitHub: [evo-eval/evoeval](https://github.com/evo-eval/evoeval)
- Webpage: [evo-eval.github.io](https://evo-eval.github.io/)
- Leaderboard: [evo-eval.github.io/leaderboard.html](https://evo-eval.github.io/leaderboard.html)
- Visualization: [evo-eval.github.io/visualization.html](https://evo-eval.github.io/visualization.html)
- Paper: [arXiv](https://arxiv.org/abs/2403.19114)
- PyPI: [evoeval](https://pypi.org/project/evoeval/)