| <div align="center"> | |
|  | |
| π οΈ [Setup](#%EF%B8%8F-setup) - | |
| π [Usage](#-usage) - | |
| π» [Demo](#-demo) - | |
| π [Ecosystem](#-ecosystem) - | |
| π [AgentLab](https://github.com/ServiceNow/AgentLab) - | |
| π [Contributors](#-contributors) - | |
| π [Paper](https://arxiv.org/abs/2412.05467) - | |
| π [Citation](#-citing-this-work) | |
| [](https://pypi.org/project/browsergym/) | |
| []([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0)) | |
| [](https://pypistats.org/packages/browsergym-core) | |
| [](https://star-history.com/#ServiceNow/BrowserGym) | |
| [](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml) | |
| [](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml) | |
| ```python | |
| pip install browsergym | |
| ``` | |
| </div> | |
| > [!WARNING] | |
| > BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research. | |
| > It is not meant to be a consumer product. Use with caution! | |
| > [!TIP] | |
| > π Check out [AgentLab](https://github.com/ServiceNow/AgentLab)β¨ ! | |
| > A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks. | |
| https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85 | |
| _Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._ | |
| BrowserGym includes the following benchmarks by default: | |
| - [MiniWoB](https://miniwob.farama.org/) | |
| - [WebArena](https://webarena.dev/) | |
| - [VisualWebArena](https://jykoh.com/vwa) | |
| - [WorkArena](https://github.com/ServiceNow/WorkArena) | |
| - [AssistantBench](https://github.com/oriyor/assistantbench) | |
| - [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark) | |
| Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class. | |
| ## π οΈ Setup | |
| To use browsergym, install one of the following packages: | |
| ```sh | |
| pip install browsergym # (recommended) everything below | |
| pip install browsergym-experiments # experiment utilities (agent, loop, benchmarks) + everything below | |
| pip install browsergym-core # core functionalities only (no benchmark, just the openended task) | |
| pip install browsergym-miniwob # core + miniwob | |
| pip install browsergym-webarena # core + webarena | |
| pip install browsergym-visualwebarena # core + visualwebarena | |
| pip install browsergym-workarena # core + workarena | |
| pip install browsergym-assistantbench # core + assistantbench | |
| pip install weblinx-browsergym # core + weblinx | |
| ``` | |
| Then setup playwright by running | |
| ```sh | |
| playwright install chromium | |
| ``` | |
| Finally, each benchmark comes with its own specific setup that requires to follow additional steps. | |
| - for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md) | |
| - for WebArena, see [webarena/README.md](browsergym/webarena/README.md) | |
| - for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md) | |
| - for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena) | |
| - for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md) | |
| ### ποΈ Development setup | |
| To install browsergym locally for development, use the following commands: | |
| ```sh | |
| git clone git@github.com:ServiceNow/BrowserGym.git | |
| cd BrowserGym | |
| make install | |
| ``` | |
| Contributions are welcome! π | |
| ## π Usage | |
| Boilerplate code to run an agent on an interactive, open-ended task: | |
| ```python | |
| import gymnasium as gym | |
| import browsergym.core # register the openended task as a gym environment | |
| # start an openended environment | |
| env = gym.make( | |
| "browsergym/openended", | |
| task_kwargs={"start_url": "https://www.google.com/"}, # starting URL | |
| wait_for_user_message=True, # wait for a user message after each agent message sent to the chat | |
| ) | |
| # run the environment <> agent loop until termination | |
| obs, info = env.reset() | |
| while True: | |
| action = ... # implement your agent here | |
| obs, reward, terminated, truncated, info = env.step(action) | |
| if terminated or truncated: | |
| break | |
| # release the environment | |
| env.close() | |
| ``` | |
| MiniWoB | |
| ```python | |
| import gymnasium as gym | |
| import browsergym.miniwob # register miniwob tasks as gym environments | |
| # start a miniwob task | |
| env = gym.make("browsergym/miniwob.choose-list") | |
| ... | |
| # list all the available miniwob tasks | |
| env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")] | |
| print("\n".join(env_ids)) | |
| ``` | |
| WorkArena | |
| ```python | |
| import gymnasium as gym | |
| import browsergym.workarena # register workarena tasks as gym environments | |
| # start a workarena task | |
| env = gym.make("browsergym/workarena.servicenow.order-ipad-pro") | |
| ... | |
| # list all the available workarena tasks | |
| env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] | |
| print("\n".join(env_ids)) | |
| ``` | |
| WebArena | |
| ```python | |
| import gymnasium as gym | |
| import browsergym.webarena # register webarena tasks as gym environments | |
| # start a webarena task | |
| env = gym.make("browsergym/webarena.310") | |
| ... | |
| # list all the available webarena tasks | |
| env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")] | |
| print("\n".join(env_ids)) | |
| ``` | |
| VisualWebArena | |
| ```python | |
| import gymnasium as gym | |
| import browsergym.webarena # register webarena tasks as gym environments | |
| # start a visualwebarena task | |
| env = gym.make("browsergym/visualwebarena.721") | |
| ... | |
| # list all the available visualwebarena tasks | |
| env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")] | |
| print("\n".join(env_ids)) | |
| ``` | |
| AssistantBench | |
| ```python | |
| import gymnasium as gym | |
| import browsergym.workarena # register assistantbench tasks as gym environments | |
| # start an assistantbench task | |
| env = gym.make("browsergym/assistantbench.validation.3") | |
| ... | |
| # list all the available assistantbench tasks | |
| env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] | |
| print("\n".join(env_ids)) | |
| ``` | |
| ## π» Demo | |
| If you want to experiment with a demo agent in BrowserGym, follow these steps | |
| ```sh | |
| # conda setup | |
| conda env create -f demo_agent/environment.yml | |
| conda activate demo_agent | |
| # or pip setup | |
| pip install -r demo_agent/requirements.txt | |
| # then download the browser for playwright | |
| playwright install chromium | |
| ``` | |
| Our demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`. | |
| Launch the demo agent as follows | |
| ```sh | |
| # openended (interactive chat mode) | |
| python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com | |
| # miniwob | |
| python demo_agent/run_demo.py --task_name miniwob.click-test | |
| # workarena | |
| python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop | |
| # webarena | |
| python demo_agent/run_demo.py --task_name webarena.4 | |
| # visualwebarena | |
| python demo_agent/run_demo.py --task_name visualwebarena.398 | |
| ``` | |
| You can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more! | |
| ```python | |
| python demo_agent/run_demo.py --help | |
| ``` | |
| ## π Ecosystem | |
| - [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces. | |
| - [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform. | |
| - [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains. | |
| - [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains. | |
| - [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages. | |
| - [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces. | |
| - [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web. | |
| - [DoomArena](https://github.com/ServiceNow/DoomArena): A framework for AI agent security testing which supports injecting attacks into web pages from Browsergym environments. | |
| ## π Contributors | |
| [](https://github.com/ServiceNow/BrowserGym/graphs/contributors) | |
| ## π Citing This Work | |
| Please use the following BibTeX to cite our work: | |
| ```tex | |
| @inproceedings{workarena2024, | |
| title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?}, | |
| author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre}, | |
| booktitle = {Proceedings of the 41st International Conference on Machine Learning}, | |
| pages = {11642--11662}, | |
| year = {2024}, | |
| editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, | |
| volume = {235}, | |
| series = {Proceedings of Machine Learning Research}, | |
| month = {21--27 Jul}, | |
| publisher = {PMLR}, | |
| url = {https://proceedings.mlr.press/v235/drouin24a.html}, | |
| } | |
| ``` | |