Spaces:
Running
Running
| title: README | |
| emoji: 🦀 | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: static | |
| pinned: false | |
| ## Model/Data associated with research project *Autonomous Evaluation and Refinement of Digital Agents*. | |
| ### [Paper](https://arxiv.org/abs/2404.06474) | [Code](https://github.com/Berkeley-NLP/Agent-Eval-Refine) | |
| We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision. | |
| [Jiayi Pan](https://www.jiayipan.me/), [Yichi Zhang](https://sled.eecs.umich.edu/author/yichi-zhang/), [Nicholas Tomlin](https://people.eecs.berkeley.edu/~nicholas_tomlin/), [Yifei Zhou](https://yifeizhou02.github.io/), [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/), [Alane Suhr](https://www.alanesuhr.com/) | |
| UC Berkeley, University of Michigan |