Spaces:

LangAGI-Lab
/

Web-Shepherd-Demo

Paused

App Files Files Community

Web-Shepherd-Demo / BrowserGym /README.md

kyle8581

upload

dd39c08 7 months ago

preview code

raw

history blame contribute delete

9.85 kB

	<div align="center">

	![BrowserGym banner](https://github.com/user-attachments/assets/4853f210-43ac-4107-a0d2-95c9c614dbe7)

	🛠️ [Setup](#%EF%B8%8F-setup) -
	🏋 [Usage](#-usage) -
	💻 [Demo](#-demo) -
	🌐 [Ecosystem](#-ecosystem) -
	🚀 [AgentLab](https://github.com/ServiceNow/AgentLab) -
	🌟 [Contributors](#-contributors) -
	📄 [Paper](https://arxiv.org/abs/2412.05467) -
	📝 [Citation](#-citing-this-work)

	[![pypi](https://badge.fury.io/py/browsergym.svg)](https://pypi.org/project/browsergym/)
	[![PyPI - License](https://img.shields.io/pypi/l/browsergym?style=flat-square)]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0))
	[![PyPI - Downloads](https://img.shields.io/pypi/dm/browsergym-core?style=flat-square)](https://pypistats.org/packages/browsergym-core)
	[![GitHub star chart](https://img.shields.io/github/stars/ServiceNow/BrowserGym?style=flat-square)](https://star-history.com/#ServiceNow/BrowserGym)
	[![Code Format](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml)
	[![Tests](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml)

	```python
	pip install browsergym
	```

	</div>

	> [!WARNING]
	> BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research.
	> It is not meant to be a consumer product. Use with caution!

	> [!TIP]
	> 🚀 Check out [AgentLab](https://github.com/ServiceNow/AgentLab)✨ !
	> A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.

	https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85

	_Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._

	BrowserGym includes the following benchmarks by default:
	- [MiniWoB](https://miniwob.farama.org/)
	- [WebArena](https://webarena.dev/)
	- [VisualWebArena](https://jykoh.com/vwa)
	- [WorkArena](https://github.com/ServiceNow/WorkArena)
	- [AssistantBench](https://github.com/oriyor/assistantbench)
	- [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark)

	Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class.

	## 🛠️ Setup

	To use browsergym, install one of the following packages:
	```sh
	pip install browsergym # (recommended) everything below
	pip install browsergym-experiments # experiment utilities (agent, loop, benchmarks) + everything below
	pip install browsergym-core # core functionalities only (no benchmark, just the openended task)
	pip install browsergym-miniwob # core + miniwob
	pip install browsergym-webarena # core + webarena
	pip install browsergym-visualwebarena # core + visualwebarena
	pip install browsergym-workarena # core + workarena
	pip install browsergym-assistantbench # core + assistantbench
	pip install weblinx-browsergym # core + weblinx
	```

	Then setup playwright by running
	```sh
	playwright install chromium
	```

	Finally, each benchmark comes with its own specific setup that requires to follow additional steps.
	- for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md)
	- for WebArena, see [webarena/README.md](browsergym/webarena/README.md)
	- for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md)
	- for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena)
	- for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md)

	### 🏗️ Development setup

	To install browsergym locally for development, use the following commands:
	```sh
	git clone git@github.com:ServiceNow/BrowserGym.git
	cd BrowserGym
	make install
	```

	Contributions are welcome! 😊

	## 🏋 Usage

	Boilerplate code to run an agent on an interactive, open-ended task:
	```python
	import gymnasium as gym
	import browsergym.core # register the openended task as a gym environment

	# start an openended environment
	env = gym.make(
	"browsergym/openended",
	task_kwargs={"start_url": "https://www.google.com/"}, # starting URL
	wait_for_user_message=True, # wait for a user message after each agent message sent to the chat
	)
	# run the environment <> agent loop until termination
	obs, info = env.reset()
	while True:
	action = ... # implement your agent here
	obs, reward, terminated, truncated, info = env.step(action)
	if terminated or truncated:
	break
	# release the environment
	env.close()
	```

	MiniWoB
	```python
	import gymnasium as gym
	import browsergym.miniwob # register miniwob tasks as gym environments

	# start a miniwob task
	env = gym.make("browsergym/miniwob.choose-list")
	...

	# list all the available miniwob tasks
	env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]
	print("\n".join(env_ids))
	```

	WorkArena
	```python
	import gymnasium as gym
	import browsergym.workarena # register workarena tasks as gym environments

	# start a workarena task
	env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")
	...

	# list all the available workarena tasks
	env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
	print("\n".join(env_ids))
	```

	WebArena
	```python
	import gymnasium as gym
	import browsergym.webarena # register webarena tasks as gym environments

	# start a webarena task
	env = gym.make("browsergym/webarena.310")
	...

	# list all the available webarena tasks
	env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]
	print("\n".join(env_ids))
	```

	VisualWebArena
	```python
	import gymnasium as gym
	import browsergym.webarena # register webarena tasks as gym environments

	# start a visualwebarena task
	env = gym.make("browsergym/visualwebarena.721")
	...

	# list all the available visualwebarena tasks
	env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")]
	print("\n".join(env_ids))
	```

	AssistantBench
	```python
	import gymnasium as gym
	import browsergym.workarena # register assistantbench tasks as gym environments

	# start an assistantbench task
	env = gym.make("browsergym/assistantbench.validation.3")
	...

	# list all the available assistantbench tasks
	env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
	print("\n".join(env_ids))
	```

	## 💻 Demo

	If you want to experiment with a demo agent in BrowserGym, follow these steps
	```sh
	# conda setup
	conda env create -f demo_agent/environment.yml
	conda activate demo_agent

	# or pip setup
	pip install -r demo_agent/requirements.txt

	# then download the browser for playwright
	playwright install chromium
	```

	Our demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`.

	Launch the demo agent as follows
	```sh
	# openended (interactive chat mode)
	python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com

	# miniwob
	python demo_agent/run_demo.py --task_name miniwob.click-test

	# workarena
	python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop

	# webarena
	python demo_agent/run_demo.py --task_name webarena.4

	# visualwebarena
	python demo_agent/run_demo.py --task_name visualwebarena.398
	```

	You can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more!

	```python
	python demo_agent/run_demo.py --help
	```

	## 🌐 Ecosystem

	- [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces.
	- [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform.
	- [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains.
	- [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains.
	- [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages.
	- [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces.
	- [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web.
	- [DoomArena](https://github.com/ServiceNow/DoomArena): A framework for AI agent security testing which supports injecting attacks into web pages from Browsergym environments.

	## 🌟 Contributors

	[![BrowserGym contributors](https://contrib.rocks/image?repo=ServiceNow/BrowserGym&max=2000)](https://github.com/ServiceNow/BrowserGym/graphs/contributors)

	## 📝 Citing This Work

	Please use the following BibTeX to cite our work:
	```tex
	@inproceedings{workarena2024,
	title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},
	author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},
	booktitle = {Proceedings of the 41st International Conference on Machine Learning},
	pages = {11642--11662},
	year = {2024},
	editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
	volume = {235},
	series = {Proceedings of Machine Learning Research},
	month = {21--27 Jul},
	publisher = {PMLR},
	url = {https://proceedings.mlr.press/v235/drouin24a.html},
	}
	```