| --- |
| title: README |
| emoji: π₯ |
| colorFrom: blue |
| colorTo: purple |
| sdk: static |
| pinned: false |
| --- |
| |
| # EvalPlus: Rigorous Evaluation of LLMs for Code Generation |
|
|
| ## About |
|
|
| EvalPlus evaluates LLM-generated code on: |
|
|
| * Code Correctness: HumanEval+ and MBPP+ |
| * Code Efficiency: EvalPerf |
|
|
| ## Resources |
|
|
| * π» **GitHub Repo**: [evalplus/evalplus](https://github.com/evalplus/evalplus) |
| * π **Leader Board**: [evalplus.github.io](https://evalplus.github.io/leaderboard.html) |
| * π **NeurIPS Paper**: [OpenReview](https://openreview.net/pdf?id=1qvx610Cu7) |
| * π **Python Package**: [PyPI](https://pypi.org/project/evalplus/) |
|
|
| ## Citations |
|
|
| ```bibtex |
| @inproceedings{evalplus, |
| title = {Is Your Code Generated by Chat{GPT} Really Correct? Rigorous Evaluation of Large Language Models for Code Generation}, |
| author = {Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming}, |
| booktitle = {Thirty-seventh Conference on Neural Information Processing Systems}, |
| year = {2023}, |
| url = {https://openreview.net/forum?id=1qvx610Cu7}, |
| } |
| |
| @inproceedings{evalperf, |
| title = {Evaluating Language Models for Efficient Code Generation}, |
| author = {Liu, Jiawei and Xie, Songrun and Wang, Junhao and Wei, Yuxiang and Ding, Yifeng and Zhang, Lingming}, |
| booktitle = {First Conference on Language Modeling}, |
| year = {2024}, |
| url = {https://openreview.net/forum?id=IBCBMeAhmC}, |
| } |
| ``` |
|
|