Spaces:

evoeval
/

README

Running

README / README.md

small update

f5d2106 verified over 1 year ago

1.43 kB

	---
	title: README
	emoji: 😻
	colorFrom: indigo
	colorTo: indigo
	sdk: static
	pinned: false
	---

	# EvoEval: Evolving Coding Benchmarks via LLM

	EvoEval<sup>1</sup> is a holistic benchmark suite created by _evolving_ HumanEval problems:
	- 🔥 Contains 828 new problems across 5 🌠 semantic-altering and 2 ⭐ semantic-preserving benchmarks
	- 🔮 Allows evaluation/comparison across different dimensions and problem types (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [visualization tool](https://evo-eval.github.io/visualization.html) for ready-to-use comparison
	- 🏆 Complete with [leaderboard](https://evo-eval.github.io/leaderboard.html), groundtruth solutions, robust testcases and evaluation scripts to easily fit into your evaluation pipeline
	- 🤖 Generated LLM code samples from >50 different models to save you time in running experiments

	<sup>1</sup> coincidentally similar pronunciation with 😈 EvilEval

	- GitHub: [evo-eval/evoeval](https://github.com/evo-eval/evoeval)
	- Webpage: [evo-eval.github.io](https://evo-eval.github.io/)
	- Leaderboard: [evo-eval.github.io/leaderboard.html](https://evo-eval.github.io/leaderboard.html)
	- Visualization: [evo-eval.github.io/visualization.html](https://evo-eval.github.io/visualization.html)
	- Paper: [arXiv](https://arxiv.org/abs/2403.19114)
	- PyPI: [evoeval](https://pypi.org/project/evoeval/)