README / README.md
Jiayi-Pan's picture
Update README.md
983934b verified
|
raw
history blame
1.03 kB
metadata
title: README
emoji: 🦀
colorFrom: blue
colorTo: blue
sdk: static
pinned: false

[Paper - Stay tuned for the Monday release!]]() | Project Page | Code

Model/Data associated with research project Autonomous Evaluation and Refinement of Digital Agents.

TLDR: We explore the design and use of model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.

Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

UC Berkeley, University of Michigan