|
<div align="center"> |
|
|
|
 |
|
|
|
π οΈ [Setup](#%EF%B8%8F-setup) - |
|
π [Usage](#-usage) - |
|
π» [Demo](#-demo) - |
|
π [Ecosystem](#-ecosystem) - |
|
π [AgentLab](https://github.com/ServiceNow/AgentLab) - |
|
π [Contributors](#-contributors) - |
|
π [Paper](https://arxiv.org/abs/2412.05467) - |
|
π [Citation](#-citing-this-work) |
|
|
|
[](https://pypi.org/project/browsergym/) |
|
[]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0)) |
|
[](https://pypistats.org/packages/browsergym-core) |
|
[](https://star-history.com/#ServiceNow/BrowserGym) |
|
[](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml) |
|
[](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml) |
|
|
|
```python |
|
pip install browsergym |
|
``` |
|
|
|
</div> |
|
|
|
> [!WARNING] |
|
> BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research. |
|
> It is not meant to be a consumer product. Use with caution! |
|
|
|
> [!TIP] |
|
> π Check out [AgentLab](https://github.com/ServiceNow/AgentLab)β¨ ! |
|
> A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks. |
|
|
|
https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85 |
|
|
|
_Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._ |
|
|
|
BrowserGym includes the following benchmarks by default: |
|
- [MiniWoB](https://miniwob.farama.org/) |
|
- [WebArena](https://webarena.dev/) |
|
- [VisualWebArena](https://jykoh.com/vwa) |
|
- [WorkArena](https://github.com/ServiceNow/WorkArena) |
|
- [AssistantBench](https://github.com/oriyor/assistantbench) |
|
- [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark) |
|
|
|
Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class. |
|
|
|
## π οΈ Setup |
|
|
|
To use browsergym, install one of the following packages: |
|
```sh |
|
pip install browsergym # (recommended) everything below |
|
pip install browsergym-experiments # experiment utilities (agent, loop, benchmarks) + everything below |
|
pip install browsergym-core # core functionalities only (no benchmark, just the openended task) |
|
pip install browsergym-miniwob # core + miniwob |
|
pip install browsergym-webarena # core + webarena |
|
pip install browsergym-visualwebarena # core + visualwebarena |
|
pip install browsergym-workarena # core + workarena |
|
pip install browsergym-assistantbench # core + assistantbench |
|
pip install weblinx-browsergym # core + weblinx |
|
``` |
|
|
|
Then setup playwright by running |
|
```sh |
|
playwright install chromium |
|
``` |
|
|
|
Finally, each benchmark comes with its own specific setup that requires to follow additional steps. |
|
- for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md) |
|
- for WebArena, see [webarena/README.md](browsergym/webarena/README.md) |
|
- for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md) |
|
- for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena) |
|
- for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md) |
|
|
|
### ποΈ Development setup |
|
|
|
To install browsergym locally for development, use the following commands: |
|
```sh |
|
git clone git@github.com:ServiceNow/BrowserGym.git |
|
cd BrowserGym |
|
make install |
|
``` |
|
|
|
Contributions are welcome! π |
|
|
|
## π Usage |
|
|
|
Boilerplate code to run an agent on an interactive, open-ended task: |
|
```python |
|
import gymnasium as gym |
|
import browsergym.core # register the openended task as a gym environment |
|
|
|
# start an openended environment |
|
env = gym.make( |
|
"browsergym/openended", |
|
task_kwargs={"start_url": "https://www.google.com/"}, # starting URL |
|
wait_for_user_message=True, # wait for a user message after each agent message sent to the chat |
|
) |
|
# run the environment <> agent loop until termination |
|
obs, info = env.reset() |
|
while True: |
|
action = ... # implement your agent here |
|
obs, reward, terminated, truncated, info = env.step(action) |
|
if terminated or truncated: |
|
break |
|
# release the environment |
|
env.close() |
|
``` |
|
|
|
MiniWoB |
|
```python |
|
import gymnasium as gym |
|
import browsergym.miniwob # register miniwob tasks as gym environments |
|
|
|
# start a miniwob task |
|
env = gym.make("browsergym/miniwob.choose-list") |
|
... |
|
|
|
# list all the available miniwob tasks |
|
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")] |
|
print("\n".join(env_ids)) |
|
``` |
|
|
|
WorkArena |
|
```python |
|
import gymnasium as gym |
|
import browsergym.workarena # register workarena tasks as gym environments |
|
|
|
# start a workarena task |
|
env = gym.make("browsergym/workarena.servicenow.order-ipad-pro") |
|
... |
|
|
|
# list all the available workarena tasks |
|
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] |
|
print("\n".join(env_ids)) |
|
``` |
|
|
|
WebArena |
|
```python |
|
import gymnasium as gym |
|
import browsergym.webarena # register webarena tasks as gym environments |
|
|
|
# start a webarena task |
|
env = gym.make("browsergym/webarena.310") |
|
... |
|
|
|
# list all the available webarena tasks |
|
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")] |
|
print("\n".join(env_ids)) |
|
``` |
|
|
|
VisualWebArena |
|
```python |
|
import gymnasium as gym |
|
import browsergym.webarena # register webarena tasks as gym environments |
|
|
|
# start a visualwebarena task |
|
env = gym.make("browsergym/visualwebarena.721") |
|
... |
|
|
|
# list all the available visualwebarena tasks |
|
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")] |
|
print("\n".join(env_ids)) |
|
``` |
|
|
|
AssistantBench |
|
```python |
|
import gymnasium as gym |
|
import browsergym.workarena # register assistantbench tasks as gym environments |
|
|
|
# start an assistantbench task |
|
env = gym.make("browsergym/assistantbench.validation.3") |
|
... |
|
|
|
# list all the available assistantbench tasks |
|
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] |
|
print("\n".join(env_ids)) |
|
``` |
|
|
|
## π» Demo |
|
|
|
If you want to experiment with a demo agent in BrowserGym, follow these steps |
|
```sh |
|
# conda setup |
|
conda env create -f demo_agent/environment.yml |
|
conda activate demo_agent |
|
|
|
# or pip setup |
|
pip install -r demo_agent/requirements.txt |
|
|
|
# then download the browser for playwright |
|
playwright install chromium |
|
``` |
|
|
|
Our demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`. |
|
|
|
Launch the demo agent as follows |
|
```sh |
|
# openended (interactive chat mode) |
|
python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com |
|
|
|
# miniwob |
|
python demo_agent/run_demo.py --task_name miniwob.click-test |
|
|
|
# workarena |
|
python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop |
|
|
|
# webarena |
|
python demo_agent/run_demo.py --task_name webarena.4 |
|
|
|
# visualwebarena |
|
python demo_agent/run_demo.py --task_name visualwebarena.398 |
|
``` |
|
|
|
You can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more! |
|
|
|
```python |
|
python demo_agent/run_demo.py --help |
|
``` |
|
|
|
## π Ecosystem |
|
|
|
- [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces. |
|
- [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform. |
|
- [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains. |
|
- [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains. |
|
- [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages. |
|
- [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces. |
|
- [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web. |
|
- [DoomArena](https://github.com/ServiceNow/DoomArena): A framework for AI agent security testing which supports injecting attacks into web pages from Browsergym environments. |
|
|
|
## π Contributors |
|
|
|
[](https://github.com/ServiceNow/BrowserGym/graphs/contributors) |
|
|
|
## π Citing This Work |
|
|
|
Please use the following BibTeX to cite our work: |
|
```tex |
|
@inproceedings{workarena2024, |
|
title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?}, |
|
author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre}, |
|
booktitle = {Proceedings of the 41st International Conference on Machine Learning}, |
|
pages = {11642--11662}, |
|
year = {2024}, |
|
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, |
|
volume = {235}, |
|
series = {Proceedings of Machine Learning Research}, |
|
month = {21--27 Jul}, |
|
publisher = {PMLR}, |
|
url = {https://proceedings.mlr.press/v235/drouin24a.html}, |
|
} |
|
``` |
|
|