sergiopaniego HF Staff commited on
Commit
c7a11c8
·
verified ·
1 Parent(s): 5c55f8a

Upload folder using huggingface_hub

Browse files
Dockerfile ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use public Python base image for HuggingFace compatibility
2
+ FROM python:3.11-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app/env
6
+
7
+ # Install system dependencies for Playwright and browsers
8
+ RUN apt-get update && apt-get install -y --no-install-recommends \
9
+ # Playwright browser dependencies
10
+ libnss3 \
11
+ libnspr4 \
12
+ libatk1.0-0 \
13
+ libatk-bridge2.0-0 \
14
+ libcups2 \
15
+ libdrm2 \
16
+ libdbus-1-3 \
17
+ libxkbcommon0 \
18
+ libatspi2.0-0 \
19
+ libxcomposite1 \
20
+ libxdamage1 \
21
+ libxfixes3 \
22
+ libxrandr2 \
23
+ libgbm1 \
24
+ libpango-1.0-0 \
25
+ libcairo2 \
26
+ libasound2 \
27
+ libxshmfence1 \
28
+ fonts-unifont \
29
+ fonts-noto-color-emoji \
30
+ # Additional dependencies
31
+ git \
32
+ wget \
33
+ curl \
34
+ && rm -rf /var/lib/apt/lists/*
35
+
36
+ # Copy environment files first (for better caching)
37
+ # Build context should be envs/browsergym_env/ (not server/ or repo root)
38
+ COPY . .
39
+
40
+ # Make start script executable
41
+ RUN chmod +x /app/env/server/start.sh
42
+
43
+ # Install Python dependencies using pip install -e . (from pyproject.toml)
44
+ RUN pip install --no-cache-dir -e .
45
+
46
+ # Install Playwright browsers (Chromium by default)
47
+ # Use python -m since playwright command might not be in PATH
48
+ RUN python -m playwright install chromium
49
+
50
+ # Install MiniWoB++ tasks
51
+ RUN git clone --depth 1 https://github.com/Farama-Foundation/miniwob-plusplus.git /app/miniwob-plusplus
52
+
53
+ # Set environment variables
54
+ ENV PYTHONUNBUFFERED=1
55
+ ENV BROWSERGYM_BENCHMARK=miniwob
56
+ ENV BROWSERGYM_TASK_NAME="click-test"
57
+ ENV BROWSERGYM_HEADLESS=true
58
+ ENV BROWSERGYM_VIEWPORT_WIDTH=1280
59
+ ENV BROWSERGYM_VIEWPORT_HEIGHT=720
60
+ ENV BROWSERGYM_TIMEOUT=10000
61
+ ENV BROWSERGYM_PORT=8000
62
+ ENV MINIWOB_HTML_DIR=/app/miniwob-plusplus/miniwob/html
63
+ ENV MINIWOB_HTTP_PORT=8888
64
+ ENV MINIWOB_URL=http://127.0.0.1:8888/miniwob/
65
+ ENV ENABLE_WEB_INTERFACE=true
66
+
67
+ # For WebArena tasks, these should be set by the user when running the container:
68
+ # ENV SHOPPING=
69
+ # ENV SHOPPING_ADMIN=
70
+ # ENV REDDIT=
71
+ # ENV GITLAB=
72
+ # ENV MAP=
73
+ # ENV WIKIPEDIA=
74
+ # ENV HOMEPAGE=
75
+
76
+ # Expose ports
77
+ EXPOSE 8000
78
+ EXPOSE 8888
79
+
80
+ # Health check
81
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
82
+ CMD curl -f http://localhost:8000/health || exit 1
83
+
84
+ # Run the server using the start script
85
+ CMD ["/app/env/server/start.sh"]
README.md CHANGED
@@ -1,10 +1,563 @@
1
  ---
2
- title: Browsergym Env
3
- emoji: 🌖
4
- colorFrom: pink
5
- colorTo: blue
6
  sdk: docker
7
  pinned: false
 
 
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: BrowserGym Environment Server
3
+ emoji: 🌐
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
+ - browsergym
13
+ - web-automation
14
+ - reinforcement-learning
15
  ---
16
 
17
+ # BrowserGym Environment
18
+
19
+ BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv.
20
+
21
+ ## Why BrowserGym?
22
+
23
+ BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.
24
+
25
+ **What are these benchmarks?**
26
+
27
+ - **MiniWoB++ (Training)**: 100+ synthetic web tasks like "click this button", "fill out this form", "select from dropdown". Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. **No external setup needed** - tasks run in isolated browser sessions.
28
+
29
+ - **WebArena (Evaluation)**: 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like "find the cheapest laptop and add to cart" or "create a merge request for bug #123". Multistep, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. **Requires running 7 backend services** (shopping site, GitLab instance, etc.).
30
+
31
+ - **VisualWebArena**: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.
32
+
33
+ - **WorkArena**: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.
34
+
35
+ **The training → evaluation pipeline:**
36
+ 1. Train on MiniWoB (simple, controlled, fast iterations)
37
+ 2. Evaluate on WebArena (complex, realistic, measures real-world capability)
38
+
39
+ **Key advantage**: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works.
40
+
41
+ ## Quick Start - Training (MiniWoB)
42
+
43
+ ### No Setup Required! 🎉
44
+
45
+ ```python
46
+ from browsergym_env import BrowserGymEnv, BrowserGymAction
47
+
48
+ # Create environment for MiniWoB training task
49
+ env = BrowserGymEnv.from_docker_image(
50
+ "ghcr.io/openenv/browsergym-env:latest",
51
+ environment={
52
+ "BROWSERGYM_BENCHMARK": "miniwob",
53
+ "BROWSERGYM_TASK_NAME": "click-test", # or "click-button", "click-dialog", etc.
54
+ }
55
+ )
56
+
57
+ # Train your agent!
58
+ for episode in range(1000):
59
+ result = env.reset()
60
+ print(f"Goal: {result.observation.goal}")
61
+
62
+ done = False
63
+ while not done:
64
+ # Your agent decides what to do
65
+ action_str = agent.get_action(result.observation.text)
66
+ action = BrowserGymAction(action_str=action_str)
67
+
68
+ result = env.step(action)
69
+ done = result.done
70
+
71
+ print(f"Reward: {result.reward}")
72
+
73
+ env.close()
74
+ ```
75
+
76
+ ### Available Tasks by Benchmark
77
+
78
+ #### MiniWoB++ Tasks (Training - 100+ tasks)
79
+
80
+ MiniWoB tasks are organized by difficulty and type. Here are the main categories:
81
+
82
+ **Click Tasks** (Basic interaction)
83
+
84
+ | Task Name | Description | Difficulty |
85
+ |-----------|-------------|------------|
86
+ | `click-test` | Click a single button | ⭐ Easy |
87
+ | `click-button` | Click button with specific text | ⭐ Easy |
88
+ | `click-button-sequence` | Click buttons in order | ⭐⭐ Medium |
89
+ | `click-checkboxes` | Select specific checkboxes | ⭐⭐ Medium |
90
+ | `click-checkboxes-soft` | Select checkboxes (multiple valid) | ⭐⭐ Medium |
91
+ | `click-checkboxes-large` | Many checkboxes to select from | ⭐⭐ Medium |
92
+ | `click-checkboxes-transfer` | Transfer learning variation | ⭐⭐ Medium |
93
+ | `click-dialog` | Click correct button in dialog | ⭐ Easy |
94
+ | `click-dialog-2` | More complex dialog | ⭐⭐ Medium |
95
+ | `click-link` | Click on a link | ⭐ Easy |
96
+ | `click-option` | Select from dropdown | ⭐⭐ Medium |
97
+ | `click-pie` | Click on pie chart slice | ⭐⭐ Medium |
98
+ | `click-scroll-list` | Click item in scrollable list | ⭐⭐⭐ Hard |
99
+ | `click-shades` | Click on specific color shade | ⭐⭐ Medium |
100
+ | `click-shape` | Click on specific shape | ⭐⭐ Medium |
101
+ | `click-tab` | Switch between tabs | ⭐⭐ Medium |
102
+ | `click-tab-2` | More complex tab switching | ⭐⭐⭐ Hard |
103
+ | `click-widget` | Click on UI widget | ⭐⭐ Medium |
104
+
105
+ **Text Entry Tasks** (Typing and forms)
106
+
107
+ | Task Name | Description | Difficulty |
108
+ |-----------|-------------|------------|
109
+ | `enter-text` | Type text into input field | ⭐ Easy |
110
+ | `enter-text-dynamic` | Dynamic text entry | ⭐⭐ Medium |
111
+ | `enter-text-2` | Multiple text fields | ⭐⭐ Medium |
112
+ | `enter-password` | Fill password field | ⭐ Easy |
113
+ | `enter-date` | Enter a date | ⭐⭐ Medium |
114
+ | `enter-time` | Enter a time | ⭐⭐ Medium |
115
+ | `login-user` | Complete login form | ⭐⭐ Medium |
116
+ | `login-user-popup` | Login via popup | ⭐⭐⭐ Hard |
117
+
118
+ **Navigation Tasks** (Multi-step interaction)
119
+
120
+ | Task Name | Description | Difficulty |
121
+ |-----------|-------------|------------|
122
+ | `navigate-tree` | Navigate through tree structure | ⭐⭐⭐ Hard |
123
+ | `search-engine` | Use search interface | ⭐⭐ Medium |
124
+ | `use-autocomplete` | Interact with autocomplete | ⭐⭐⭐ Hard |
125
+ | `book-flight` | Book a flight (complex form) | ⭐⭐⭐⭐ Very Hard |
126
+ | `choose-date` | Pick date from calendar | ⭐⭐⭐ Hard |
127
+ | `choose-date-easy` | Simplified date picker | ⭐⭐ Medium |
128
+ | `choose-date-medium` | Medium difficulty date picker | ⭐⭐⭐ Hard |
129
+ | `choose-list` | Select from long list | ⭐⭐ Medium |
130
+
131
+ **Visual/Spatial Tasks** (Requires visual understanding)
132
+
133
+ | Task Name | Description | Difficulty |
134
+ |-----------|-------------|------------|
135
+ | `count-sides` | Count sides of shape | ⭐⭐ Medium |
136
+ | `count-shape` | Count specific shapes | ⭐⭐ Medium |
137
+ | `find-word` | Find word in text | ⭐⭐ Medium |
138
+ | `focus-text` | Focus on text element | ⭐ Easy |
139
+ | `focus-text-2` | More complex focus task | ⭐⭐ Medium |
140
+ | `grid-coordinate` | Click grid coordinate | ⭐⭐ Medium |
141
+ | `guess-number` | Guess a number game | ⭐⭐⭐ Hard |
142
+ | `identify-shape` | Identify shape type | ⭐⭐ Medium |
143
+ | `read-table` | Extract info from table | ⭐⭐⭐ Hard |
144
+ | `read-table-2` | More complex table reading | ⭐⭐⭐ Hard |
145
+
146
+ **Email/Social Tasks** (Realistic scenarios)
147
+
148
+ | Task Name | Description | Difficulty |
149
+ |-----------|-------------|------------|
150
+ | `email-inbox` | Manage email inbox | ⭐⭐⭐⭐ Very Hard |
151
+ | `email-inbox-forward` | Forward emails | ⭐⭐⭐⭐ Very Hard |
152
+ | `email-inbox-nl` | Natural language email task | ⭐⭐⭐⭐ Very Hard |
153
+ | `email-inbox-star-reply` | Star and reply to emails | ⭐⭐⭐⭐ Very Hard |
154
+ | `social-media` | Social media interaction | ⭐⭐⭐⭐ Very Hard |
155
+ | `social-media-some` | Partial social media task | ⭐⭐⭐ Hard |
156
+
157
+ **Total:** 100+ tasks across all categories
158
+
159
+ **Usage:**
160
+ ```python
161
+ # Easy task for quick testing
162
+ env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"})
163
+
164
+ # Medium difficulty for training
165
+ env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"})
166
+
167
+ # Hard task for evaluation
168
+ env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"})
169
+ ```
170
+
171
+ #### WebArena Tasks (Evaluation - 812 tasks)
172
+
173
+ WebArena tasks are organized by website and difficulty. Tasks are numbered 0-811.
174
+
175
+ **By Website:**
176
+
177
+ | Website | Task Count | Description | Example Tasks |
178
+ |---------|------------|-------------|---------------|
179
+ | Shopping | ~200 | E-commerce site | Search products, add to cart, checkout |
180
+ | Shopping Admin | ~150 | Admin panel | Manage products, orders, customers |
181
+ | Reddit | ~150 | Forum/social | Post, comment, search discussions |
182
+ | GitLab | ~200 | Code repository | Create issues, merge requests, review code |
183
+ | Wikipedia | ~100 | Knowledge base | Search, read, extract information |
184
+ | Map | ~12 | Location service | Find places, get directions |
185
+
186
+ **By Difficulty:**
187
+
188
+ | Difficulty | Task Count | Steps Required | Example |
189
+ |------------|------------|----------------|---------|
190
+ | Easy | ~200 | 1-5 steps | "Find the price of product X" |
191
+ | Medium | ~400 | 5-15 steps | "Add cheapest laptop to cart" |
192
+ | Hard | ~212 | 15+ steps | "Create merge request for bug fix" |
193
+
194
+ **Usage:**
195
+
196
+ ```python
197
+ # Task 0 (usually easy)
198
+ env = BrowserGymEnv(environment={
199
+ "BROWSERGYM_BENCHMARK": "webarena",
200
+ "BROWSERGYM_TASK_NAME": "0",
201
+ "SHOPPING": "http://your-server:7770",
202
+ # ... other URLs
203
+ })
204
+
205
+ # Task 156 (GitLab merge request)
206
+ env = BrowserGymEnv(environment={
207
+ "BROWSERGYM_BENCHMARK": "webarena",
208
+ "BROWSERGYM_TASK_NAME": "156",
209
+ # ... URLs
210
+ })
211
+ ```
212
+
213
+ **Note:** WebArena tasks require the full backend infrastructure. See [WebArena setup guide](https://github.com/web-arena-x/webarena/tree/main/environment_docker).
214
+
215
+ #### VisualWebArena Tasks (910 tasks)
216
+
217
+ Similar to WebArena but requires visual understanding. Tasks involve:
218
+ - Image-based reasoning
219
+ - Visual element identification
220
+ - Multimodal interaction (text + images)
221
+
222
+ #### WorkArena Tasks
223
+
224
+ Enterprise software automation tasks:
225
+ - CRM operations
226
+ - Project management
227
+ - Business workflows
228
+
229
+ **Full task lists:**
230
+ - [MiniWoB++ tasks](https://github.com/Farama-Foundation/miniwob-plusplus/tree/master/miniwob/environment)
231
+ - [WebArena tasks](https://github.com/web-arena-x/webarena/blob/main/config_files/)
232
+ - [BrowserGym documentation](https://github.com/ServiceNow/BrowserGym)
233
+
234
+ ## Evaluation (WebArena)
235
+
236
+ ### Prerequisites
237
+
238
+ WebArena requires setting up backend infrastructure. See the [WebArena documentation](https://github.com/web-arena-x/webarena/tree/main/environment_docker).
239
+
240
+ ### Usage
241
+
242
+ ```python
243
+ from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
244
+
245
+ # Create environment for WebArena evaluation
246
+ env = BrowserGymEnv.from_docker_image(
247
+ "ghcr.io/openenv/browsergym-env:latest",
248
+ environment={
249
+ "BROWSERGYM_BENCHMARK": "webarena",
250
+ "BROWSERGYM_TASK_NAME": "0", # Task ID
251
+ # WebArena backend URLs (required)
252
+ "SHOPPING": "http://your-server:7770",
253
+ "SHOPPING_ADMIN": "http://your-server:7780/admin",
254
+ "REDDIT": "http://your-server:9999",
255
+ "GITLAB": "http://your-server:8023",
256
+ "MAP": "http://your-server:3000",
257
+ "WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing",
258
+ "HOMEPAGE": "http://your-server:4399",
259
+ }
260
+ )
261
+
262
+ # Evaluate your trained agent
263
+ result = env.reset()
264
+ while not result.done:
265
+ action_str = agent.get_action(result.observation)
266
+ action = BrowserGymAction(action_str=action_str)
267
+ result = env.step(action)
268
+
269
+ print(f"Success: {result.reward}")
270
+ env.close()
271
+ ```
272
+
273
+ ## Building the Docker Image
274
+
275
+ ### Prerequisites
276
+
277
+ 1. **Base Image**: Build the OpenEnv base image first:
278
+
279
+ ```bash
280
+ # From the OpenEnv repository root
281
+ docker build -t openenv-base:latest -f src/openenv/core/containers/images/Dockerfile .
282
+ ```
283
+
284
+ ### Build the BrowserGym Environment
285
+
286
+ ```bash
287
+ # From the browsergym_env directory
288
+ cd envs/browsergym_env
289
+ docker build -t browsergym-env:latest -f server/Dockerfile .
290
+ ```
291
+
292
+ ### Run the Server
293
+
294
+ #### For MiniWoB (Training):
295
+
296
+ ```bash
297
+ docker run -p 8000:8000 \
298
+ -e BROWSERGYM_BENCHMARK="miniwob" \
299
+ -e BROWSERGYM_TASK_NAME="click-test" \
300
+ browsergym-env:latest
301
+ ```
302
+
303
+ #### For WebArena (Evaluation):
304
+
305
+ ```bash
306
+ docker run -p 8000:8000 \
307
+ -e BROWSERGYM_BENCHMARK="webarena" \
308
+ -e BROWSERGYM_TASK_NAME="0" \
309
+ -e SHOPPING="http://your-server:7770" \
310
+ -e SHOPPING_ADMIN="http://your-server:7780/admin" \
311
+ -e REDDIT="http://your-server:9999" \
312
+ -e GITLAB="http://your-server:8023" \
313
+ -e MAP="http://your-server:3000" \
314
+ -e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \
315
+ -e HOMEPAGE="http://your-server:4399" \
316
+ browsergym-env:latest
317
+ ```
318
+
319
+ ## Environment Details
320
+
321
+ ### Action
322
+
323
+ Actions in BrowserGym are natural language strings that describe browser operations:
324
+
325
+ ```python
326
+ from envs.browsergym_env import BrowserGymAction
327
+
328
+ # Click actions
329
+ action = BrowserGymAction(action_str="click('Submit button')")
330
+ action = BrowserGymAction(action_str="click('element_id_123')")
331
+
332
+ # Type actions
333
+ action = BrowserGymAction(action_str="fill('username', 'john@example.com')")
334
+ action = BrowserGymAction(action_str="fill('password', 'secret123')")
335
+
336
+ # Navigate actions
337
+ action = BrowserGymAction(action_str="goto('https://example.com')")
338
+
339
+ # Keyboard actions
340
+ action = BrowserGymAction(action_str="press('Enter')")
341
+ action = BrowserGymAction(action_str="press('Tab')")
342
+
343
+ # Scroll actions
344
+ action = BrowserGymAction(action_str="scroll('down')")
345
+ ```
346
+
347
+ ### Observation
348
+
349
+ Observations contain multiple modalities:
350
+
351
+ ```python
352
+ result = env.step(action)
353
+ obs = result.observation
354
+
355
+ # Text observations
356
+ print(obs.text) # Primary text representation (AXTree or DOM)
357
+ print(obs.axtree_txt) # Accessibility tree
358
+ print(obs.pruned_html) # Pruned HTML (interactive elements only)
359
+
360
+ # Page metadata
361
+ print(obs.url) # Current URL
362
+ print(obs.goal) # Task goal/instruction
363
+
364
+ # Visual (if enabled)
365
+ if obs.screenshot is not None:
366
+ print(obs.screenshot.shape) # [height, width, channels]
367
+
368
+ # Error handling
369
+ if obs.last_action_error:
370
+ print(f"Action failed: {obs.error}")
371
+
372
+ # Episode status
373
+ print(obs.done) # True if episode ended
374
+ print(obs.reward) # Reward for the step
375
+
376
+ # Access full BrowserGym data (includes timestamps, etc.)
377
+ print(obs.metadata["browsergym_obs"]) # Full observation dict from BrowserGym
378
+ print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.)
379
+ ```
380
+
381
+ #### Advanced: Accessing Raw BrowserGym Data
382
+
383
+ For VisualWebArena or custom training, you may need additional data like timestamps or browser state. The full BrowserGym observation and info dicts are preserved in `metadata`:
384
+
385
+ ```python
386
+ result = env.step(action)
387
+
388
+ # Access timestamps (if available)
389
+ info = result.observation.metadata["browsergym_info"]
390
+ if "timestamp" in info:
391
+ print(f"Action timestamp: {info['timestamp']}")
392
+
393
+ # Access additional observation fields
394
+ obs_dict = result.observation.metadata["browsergym_obs"]
395
+ if "dom_object" in obs_dict:
396
+ dom = obs_dict["dom_object"]
397
+ # Work with raw DOM object
398
+
399
+ # Access page performance data
400
+ if "performance" in info:
401
+ print(f"Page load time: {info['performance']}")
402
+ ```
403
+
404
+ ### State
405
+
406
+ The environment state tracks progress:
407
+
408
+ ```python
409
+ state = env.state()
410
+
411
+ print(f"Benchmark: {state.benchmark}") # 'miniwob', 'webarena', etc.
412
+ print(f"Task: {state.task_name}") # Task name/ID
413
+ print(f"Episode: {state.episode_id}") # Unique episode ID
414
+ print(f"Steps: {state.step_count}") # Number of steps taken
415
+ print(f"Total Reward: {state.cum_reward}") # Cumulative reward
416
+ print(f"Goal: {state.goal}") # Task instruction
417
+ print(f"URL: {state.current_url}") # Current page URL
418
+ ```
419
+
420
+ ## Configuration
421
+
422
+ Environment variables:
423
+
424
+ ### Common Settings
425
+ - `BROWSERGYM_BENCHMARK`: Benchmark to use (`miniwob`, `webarena`, `visualwebarena`, `workarena`)
426
+ - `BROWSERGYM_TASK_NAME`: Specific task name (optional, will use first available if not set)
427
+ - `BROWSERGYM_HEADLESS`: Run browser in headless mode (default: `true`)
428
+ - `BROWSERGYM_VIEWPORT_WIDTH`: Browser viewport width (default: `1280`)
429
+ - `BROWSERGYM_VIEWPORT_HEIGHT`: Browser viewport height (default: `720`)
430
+ - `BROWSERGYM_TIMEOUT`: Action timeout in milliseconds (default: `10000`)
431
+
432
+ ### WebArena-Specific (only needed for WebArena benchmark)
433
+ - `SHOPPING`: Shopping website URL
434
+ - `SHOPPING_ADMIN`: Shopping admin panel URL
435
+ - `REDDIT`: Reddit-like forum URL
436
+ - `GITLAB`: GitLab instance URL
437
+ - `MAP`: Map service URL
438
+ - `WIKIPEDIA`: Wikipedia instance URL
439
+ - `HOMEPAGE`: Homepage URL
440
+
441
+ ## Supported Benchmarks
442
+
443
+ ### 1. MiniWoB++ (Training) ✅ Recommended for Training
444
+
445
+ - **100+ tasks** ranging from simple (click buttons) to complex (form filling, navigation)
446
+ - **Fast**: Instant resets, quick episodes
447
+ - **Randomized**: Task variations for generalization
448
+ - **No setup**: Works out-of-the-box
449
+ - **Dense rewards**: Immediate feedback for learning
450
+
451
+ **Use Case**: Train agents on fundamental web navigation skills
452
+
453
+ ### 2. WebArena (Evaluation) 📊 Benchmark
454
+
455
+ - **812 realistic tasks** across 6 websites
456
+ - **Complex**: Multi-step reasoning, real web interfaces
457
+ - **Requires setup**: Need to run 7 backend services
458
+ - **Sparse rewards**: Binary success/failure
459
+ - **Evaluation-focused**: Test real-world performance
460
+
461
+ **Use Case**: Evaluate agents on realistic web tasks
462
+
463
+ ### 3. VisualWebArena (Evaluation) 👁️ Visual Benchmark
464
+
465
+ - **910 tasks** requiring visual understanding
466
+ - **Multimodal**: Both text and visual observations
467
+ - **Requires setup**: Similar to WebArena
468
+ - **Challenging**: Requires visual reasoning
469
+
470
+ **Use Case**: Test visual web navigation capabilities
471
+
472
+ ### 4. WorkArena (Evaluation) 💼 Enterprise Benchmark
473
+
474
+ - **Enterprise tasks**: CRM, project management, etc.
475
+ - **Realistic workflows**: Real enterprise software
476
+ - **Requires setup**: Enterprise software instances
477
+
478
+ **Use Case**: Evaluate on business automation tasks
479
+
480
+ ## Typical Training Pipeline
481
+
482
+ ```python
483
+ from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
484
+
485
+ # Stage 1: Train on MiniWoB (simple tasks, fast)
486
+ train_env = BrowserGymEnv.from_docker_image(
487
+ "browsergym-env:latest",
488
+ environment={
489
+ "BROWSERGYM_BENCHMARK": "miniwob",
490
+ "BROWSERGYM_TASK_NAME": "click-button",
491
+ }
492
+ )
493
+
494
+ # Train your agent (RL, imitation learning, etc.)
495
+ agent.train(train_env, num_episodes=10000)
496
+ train_env.close()
497
+
498
+ # Stage 2: Evaluate on WebArena (complex tasks, realistic)
499
+ eval_env = BrowserGymEnv.from_docker_image(
500
+ "browsergym-env:latest",
501
+ environment={
502
+ "BROWSERGYM_BENCHMARK": "webarena",
503
+ "BROWSERGYM_TASK_NAME": "0",
504
+ # ... WebArena URLs
505
+ }
506
+ )
507
+
508
+ # Test performance
509
+ success_rate = agent.evaluate(eval_env, num_tasks=812)
510
+ print(f"WebArena Success Rate: {success_rate:.2%}")
511
+ eval_env.close()
512
+ ```
513
+
514
+ ## Development & Testing
515
+
516
+ ### Running Tests
517
+
518
+ ```bash
519
+ # From the OpenEnv repository root
520
+ pytest tests/envs/test_browsergym_env.py
521
+ ```
522
+
523
+ ### Local Development
524
+
525
+ ```bash
526
+ # Install in development mode
527
+ cd /path/to/OpenEnv
528
+ pip install -e .
529
+
530
+ # Install BrowserGym
531
+ pip install browsergym browsergym-miniwob browsergym-webarena
532
+
533
+ # Run the server locally
534
+ cd envs/browsergym_env/server
535
+ export BROWSERGYM_BENCHMARK=miniwob
536
+ export BROWSERGYM_TASK_NAME=click-test
537
+ python app.py
538
+ ```
539
+
540
+ ## Project Structure
541
+
542
+ ```
543
+ browsergym_env/
544
+ ├── __init__.py # Module exports
545
+ ├── models.py # Action, Observation, State dataclasses
546
+ ├── client.py # HTTPEnvClient implementation
547
+ ├── README.md # This file
548
+ └── server/
549
+ ├── __init__.py
550
+ ├── app.py # FastAPI application
551
+ ├── browsergym_environment.py # Environment implementation
552
+ ├── Dockerfile # Container specification
553
+ └── requirements.txt # Python dependencies
554
+ ```
555
+
556
+ ## References
557
+
558
+ - [BrowserGym GitHub](https://github.com/ServiceNow/BrowserGym)
559
+ - [MiniWoB++ Paper](https://arxiv.org/abs/1802.08802)
560
+ - [WebArena Paper](https://arxiv.org/abs/2307.13854)
561
+ - [WebArena Website](https://webarena.dev/)
562
+ - [VisualWebArena Paper](https://jykoh.com/vwa)
563
+ - [OpenEnv Documentation](https://github.com/meta-pytorch/OpenEnv)
__init__.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """BrowserGym Environment for OpenEnv.
2
+
3
+ BrowserGym is a unified framework for web-based agent tasks that provides
4
+ access to multiple benchmarks under a single Gymnasium-compatible API.
5
+
6
+ Included Benchmarks:
7
+ - **MiniWoB++**: 100+ simple web tasks for training (no external infrastructure!)
8
+ - **WebArena**: 812 realistic evaluation tasks (requires backend setup)
9
+ - **VisualWebArena**: Visual web navigation tasks
10
+ - **WorkArena**: Enterprise task automation
11
+
12
+ Key Features:
13
+ - Unified API across all benchmarks
14
+ - Gymnasium-compatible interface
15
+ - Support for multiple observation types (text, visual, DOM)
16
+ - Action spaces for natural language commands
17
+ - Perfect for training (MiniWoB) and evaluation (WebArena)
18
+
19
+ Training Example (MiniWoB - works immediately):
20
+ ```python
21
+ from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
22
+
23
+ # Create training environment - no backend setup needed!
24
+ env = BrowserGymEnv.from_docker_image(
25
+ "browsergym-env:latest",
26
+ environment={
27
+ "BROWSERGYM_BENCHMARK": "miniwob",
28
+ "BROWSERGYM_TASK_NAME": "click-test",
29
+ }
30
+ )
31
+
32
+ # Train your agent
33
+ for episode in range(1000):
34
+ result = env.reset()
35
+ while not result.done:
36
+ action = agent.get_action(result.observation)
37
+ result = env.step(action)
38
+
39
+ env.close()
40
+ ```
41
+
42
+ Evaluation Example (WebArena - requires backend):
43
+ ```python
44
+ from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
45
+
46
+ # Create evaluation environment
47
+ env = BrowserGymEnv.from_docker_image(
48
+ "browsergym-env:latest",
49
+ environment={
50
+ "BROWSERGYM_BENCHMARK": "webarena",
51
+ "BROWSERGYM_TASK_NAME": "0",
52
+ "SHOPPING": "http://your-server:7770",
53
+ # ... other backend URLs
54
+ }
55
+ )
56
+
57
+ # Evaluate your trained agent
58
+ result = env.reset()
59
+ # ... run evaluation
60
+ env.close()
61
+ ```
62
+ """
63
+
64
+ from .client import BrowserGymEnv
65
+ from .models import BrowserGymAction, BrowserGymObservation, BrowserGymState
66
+
67
+ __all__ = [
68
+ "BrowserGymEnv",
69
+ "BrowserGymAction",
70
+ "BrowserGymObservation",
71
+ "BrowserGymState",
72
+ ]
client.py ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Client for the BrowserGym environment."""
2
+
3
+ from typing import Any, Dict
4
+
5
+ from openenv.core.client_types import StepResult
6
+ from openenv.core.env_client import EnvClient
7
+ from .models import (
8
+ BrowserGymAction,
9
+ BrowserGymObservation,
10
+ BrowserGymState,
11
+ )
12
+
13
+
14
+ class BrowserGymEnv(EnvClient[BrowserGymAction, BrowserGymObservation, BrowserGymState]):
15
+ """Client for interacting with the BrowserGym environment.
16
+
17
+ BrowserGym provides unified access to multiple web navigation benchmarks:
18
+ - MiniWoB++: 100+ training tasks (no external infrastructure needed!)
19
+ - WebArena: 812 evaluation tasks (requires backend setup)
20
+ - VisualWebArena: Visual navigation tasks
21
+ - WorkArena: Enterprise automation tasks
22
+
23
+ Example usage for TRAINING (MiniWoB - works out of the box):
24
+ ```python
25
+ from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
26
+
27
+ # Create environment for MiniWoB training task
28
+ env = BrowserGymEnv.from_docker_image(
29
+ "browsergym-env:latest",
30
+ environment={
31
+ "BROWSERGYM_BENCHMARK": "miniwob",
32
+ "BROWSERGYM_TASK_NAME": "click-test",
33
+ }
34
+ )
35
+
36
+ # Reset and get initial observation
37
+ result = env.reset()
38
+ print(f"Task: {result.observation.goal}")
39
+ print(f"Page: {result.observation.text[:200]}")
40
+
41
+ # Take actions
42
+ action = BrowserGymAction(action_str="click('Submit button')")
43
+ result = env.step(action)
44
+ print(f"Reward: {result.reward}")
45
+ print(f"Done: {result.done}")
46
+
47
+ env.close()
48
+ ```
49
+
50
+ Example usage for EVALUATION (WebArena - requires backend):
51
+ ```python
52
+ from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
53
+
54
+ # Create environment for WebArena evaluation
55
+ env = BrowserGymEnv.from_docker_image(
56
+ "browsergym-env:latest",
57
+ environment={
58
+ "BROWSERGYM_BENCHMARK": "webarena",
59
+ "BROWSERGYM_TASK_NAME": "0", # Task 0
60
+ # WebArena backend URLs
61
+ "SHOPPING": "http://your-server:7770",
62
+ "GITLAB": "http://your-server:8023",
63
+ # ... other URLs
64
+ }
65
+ )
66
+
67
+ result = env.reset()
68
+ # ... interact with environment
69
+ env.close()
70
+ ```
71
+
72
+ Available benchmarks:
73
+ - miniwob: MiniWoB++ tasks (training, no setup required)
74
+ - webarena: WebArena tasks (evaluation, requires backend)
75
+ - visualwebarena: Visual WebArena tasks (evaluation, requires backend)
76
+ - workarena: WorkArena tasks (evaluation, requires backend)
77
+ """
78
+
79
+ def _step_payload(self, action: BrowserGymAction) -> Dict[str, Any]:
80
+ """Convert a BrowserGymAction to the JSON payload for the server."""
81
+ return {
82
+ "action_str": action.action_str,
83
+ "metadata": action.metadata,
84
+ }
85
+
86
+ def _parse_result(self, payload: Dict[str, Any]) -> StepResult[BrowserGymObservation]:
87
+ """Parse the server response into a StepResult."""
88
+ obs_data = payload.get("observation", {})
89
+
90
+ observation = BrowserGymObservation(
91
+ text=obs_data.get("text", ""),
92
+ url=obs_data.get("url", ""),
93
+ screenshot=obs_data.get("screenshot"),
94
+ goal=obs_data.get("goal", ""),
95
+ axtree_txt=obs_data.get("axtree_txt", ""),
96
+ pruned_html=obs_data.get("pruned_html", ""),
97
+ error=obs_data.get("error", ""),
98
+ last_action_error=obs_data.get("last_action_error", False),
99
+ done=payload.get("done", False),
100
+ reward=payload.get("reward"),
101
+ metadata=obs_data.get("metadata", {}),
102
+ )
103
+
104
+ return StepResult(
105
+ observation=observation,
106
+ reward=payload.get("reward"),
107
+ done=payload.get("done", False),
108
+ )
109
+
110
+ def _parse_state(self, payload: Dict[str, Any]) -> BrowserGymState:
111
+ """Parse the server state response into a BrowserGymState object."""
112
+ return BrowserGymState(
113
+ episode_id=payload.get("episode_id"),
114
+ step_count=payload.get("step_count", 0),
115
+ benchmark=payload.get("benchmark", ""),
116
+ task_name=payload.get("task_name", ""),
117
+ task_id=payload.get("task_id"),
118
+ goal=payload.get("goal", ""),
119
+ current_url=payload.get("current_url", ""),
120
+ max_steps=payload.get("max_steps"),
121
+ cum_reward=payload.get("cum_reward", 0.0),
122
+ )
models.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Data models for the BrowserGym environment.
2
+
3
+ BrowserGym is a unified framework for web-based agent tasks, combining multiple
4
+ benchmarks including MiniWoB (training), WebArena (evaluation), VisualWebArena,
5
+ and more under a single Gymnasium-compatible API.
6
+ """
7
+
8
+ from typing import List, Optional
9
+
10
+ from pydantic import Field
11
+
12
+ from openenv.core.env_server.types import Action, Observation, State
13
+
14
+
15
+ class BrowserGymAction(Action):
16
+ """Action to be executed in the BrowserGym environment.
17
+
18
+ BrowserGym supports high-level natural language actions that can be parsed
19
+ into browser operations.
20
+
21
+ Example actions:
22
+ - "click('Submit button')"
23
+ - "fill('username', 'john@example.com')"
24
+ - "goto('https://example.com')"
25
+ - "scroll(down)"
26
+ - "send_keys('Enter')"
27
+ """
28
+
29
+ action_str: str = Field(..., description="Natural language action string (e.g., \"click('Submit')\")")
30
+
31
+
32
+ class BrowserGymObservation(Observation):
33
+ """Observation returned from the BrowserGym environment.
34
+
35
+ Contains multiple observation modalities including text (accessibility tree
36
+ or DOM), visual (screenshot), and page metadata.
37
+ """
38
+
39
+ text: str = Field(default="", description="Text representation of the page (accessibility tree or DOM)")
40
+
41
+ url: str = Field(default="", description="Current URL of the page")
42
+
43
+ screenshot: Optional[List[List[List[int]]]] = Field(
44
+ default=None,
45
+ description="Screenshot as numpy array [height, width, channels] (if visual observation enabled)"
46
+ )
47
+
48
+ goal: str = Field(default="", description="Task goal/instruction for the current episode")
49
+
50
+ axtree_txt: str = Field(default="", description="Full accessibility tree as text")
51
+
52
+ pruned_html: str = Field(default="", description="Pruned HTML content (interactive elements only)")
53
+
54
+ error: str = Field(default="", description="Error message if action execution failed")
55
+
56
+ last_action_error: bool = Field(default=False, description="Whether the last action resulted in an error")
57
+
58
+
59
+ class BrowserGymState(State):
60
+ """State of the BrowserGym environment.
61
+
62
+ Tracks the current benchmark, task, and progress through an episode.
63
+ """
64
+
65
+ benchmark: str = Field(default="", description="Benchmark name (e.g., 'miniwob', 'webarena', 'visualwebarena')")
66
+
67
+ task_name: str = Field(default="", description="Specific task within the benchmark (e.g., 'click-test', 'click-button')")
68
+
69
+ task_id: Optional[str] = Field(default=None, description="Task ID for evaluation benchmarks (e.g., WebArena task number)")
70
+
71
+ goal: str = Field(default="", description="Task goal/instruction")
72
+
73
+ current_url: str = Field(default="", description="Current URL of the active page")
74
+
75
+ max_steps: Optional[int] = Field(default=None, description="Maximum steps allowed for this task")
76
+
77
+ cum_reward: float = Field(default=0.0, description="Cumulative reward for the current episode")
openenv.yaml ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ name: browsergym_env
2
+ version: "0.1.0"
3
+ description: "BrowserGym environment for web automation tasks using Playwright"
4
+ action: BrowserGymAction
5
+ observation: BrowserGymObservation
openenv_browsergym_env.egg-info/PKG-INFO ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: openenv-browsergym_env
3
+ Version: 0.1.0
4
+ Summary: BrowserGym Environment for OpenEnv - Web automation using Playwright
5
+ Requires-Python: >=3.10
6
+ Requires-Dist: openenv-core[core]>=0.2.0
7
+ Requires-Dist: fastapi>=0.104.0
8
+ Requires-Dist: uvicorn[standard]>=0.24.0
9
+ Requires-Dist: pydantic>=2.0.0
10
+ Requires-Dist: requests>=2.25.0
11
+ Requires-Dist: browsergym-core>=0.2.0
12
+ Requires-Dist: browsergym-miniwob>=0.2.0
13
+ Requires-Dist: browsergym-webarena>=0.2.0
14
+ Requires-Dist: gymnasium>=0.29.0
15
+ Requires-Dist: playwright>=1.40.0
16
+ Requires-Dist: greenlet>=3.1.0
17
+ Requires-Dist: Pillow>=10.0.0
18
+ Provides-Extra: dev
19
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
20
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
21
+ Requires-Dist: ipykernel>=6.29.5; extra == "dev"
openenv_browsergym_env.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ __init__.py
3
+ client.py
4
+ models.py
5
+ openenv.yaml
6
+ pyproject.toml
7
+ ./README.md
8
+ ./__init__.py
9
+ ./client.py
10
+ ./models.py
11
+ ./openenv.yaml
12
+ ./server/__init__.py
13
+ ./server/app.py
14
+ ./server/browsergym_environment.py
15
+ openenv_browsergym_env.egg-info/PKG-INFO
16
+ openenv_browsergym_env.egg-info/SOURCES.txt
17
+ openenv_browsergym_env.egg-info/dependency_links.txt
18
+ openenv_browsergym_env.egg-info/entry_points.txt
19
+ openenv_browsergym_env.egg-info/requires.txt
20
+ openenv_browsergym_env.egg-info/top_level.txt
21
+ server/__init__.py
22
+ server/app.py
23
+ server/browsergym_environment.py
openenv_browsergym_env.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
openenv_browsergym_env.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = browsergym_env.server.app:main
openenv_browsergym_env.egg-info/requires.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ openenv-core[core]>=0.2.0
2
+ fastapi>=0.104.0
3
+ uvicorn[standard]>=0.24.0
4
+ pydantic>=2.0.0
5
+ requests>=2.25.0
6
+ browsergym-core>=0.2.0
7
+ browsergym-miniwob>=0.2.0
8
+ browsergym-webarena>=0.2.0
9
+ gymnasium>=0.29.0
10
+ playwright>=1.40.0
11
+ greenlet>=3.1.0
12
+ Pillow>=10.0.0
13
+
14
+ [dev]
15
+ pytest>=8.0.0
16
+ pytest-cov>=4.0.0
17
+ ipykernel>=6.29.5
openenv_browsergym_env.egg-info/top_level.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ browsergym_env
pyproject.toml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=45", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "openenv-browsergym_env"
7
+ version = "0.1.0"
8
+ description = "BrowserGym Environment for OpenEnv - Web automation using Playwright"
9
+ requires-python = ">=3.10"
10
+ dependencies = [
11
+ # Install from GitHub to get latest version with updated create_app signature
12
+ "openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@main",
13
+ "fastapi>=0.104.0",
14
+ "uvicorn[standard]>=0.24.0",
15
+ "pydantic>=2.0.0",
16
+ "requests>=2.25.0",
17
+ "browsergym-core>=0.2.0",
18
+ "browsergym-miniwob>=0.2.0",
19
+ "browsergym-webarena>=0.2.0",
20
+ "gymnasium>=0.29.0",
21
+ "playwright>=1.40.0",
22
+ "greenlet>=3.1.0", # Required for Python 3.13 compatibility
23
+ "Pillow>=10.0.0",
24
+ ]
25
+
26
+ [project.optional-dependencies]
27
+ dev = [
28
+ "pytest>=8.0.0",
29
+ "pytest-cov>=4.0.0",
30
+ "ipykernel>=6.29.5",
31
+ ]
32
+
33
+ [project.scripts]
34
+ server = "browsergym_env.server.app:main"
35
+
36
+ [tool.setuptools]
37
+ packages = ["browsergym_env", "browsergym_env.server"]
38
+ package-dir = { "browsergym_env" = ".", "browsergym_env.server" = "server" }
39
+
40
+ [tool.setuptools.package-data]
41
+ browsergym_env = ["**/*.yaml", "**/*.yml", "**/*.md"]
server/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """BrowserGym environment server module."""
server/app.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """FastAPI server for the BrowserGym environment."""
2
+
3
+ import os
4
+
5
+ from openenv.core.env_server.http_server import create_app
6
+ from browsergym_env.models import BrowserGymAction, BrowserGymObservation
7
+ from browsergym_env.server.browsergym_environment import BrowserGymEnvironment
8
+
9
+ # Get configuration from environment variables
10
+ benchmark = os.environ.get("BROWSERGYM_BENCHMARK", "miniwob")
11
+ task_name = os.environ.get("BROWSERGYM_TASK_NAME") # Optional, can be None
12
+ headless = os.environ.get("BROWSERGYM_HEADLESS", "true").lower() == "true"
13
+ viewport_width = int(os.environ.get("BROWSERGYM_VIEWPORT_WIDTH", "1280"))
14
+ viewport_height = int(os.environ.get("BROWSERGYM_VIEWPORT_HEIGHT", "720"))
15
+ timeout = float(os.environ.get("BROWSERGYM_TIMEOUT", "10000"))
16
+ port = int(os.environ.get("BROWSERGYM_PORT", "8000"))
17
+
18
+
19
+ # Factory function to create BrowserGymEnvironment instances
20
+ def create_browsergym_environment():
21
+ """Factory function that creates BrowserGymEnvironment with config."""
22
+ return BrowserGymEnvironment(
23
+ benchmark=benchmark,
24
+ task_name=task_name,
25
+ headless=headless,
26
+ viewport_width=viewport_width,
27
+ viewport_height=viewport_height,
28
+ timeout=timeout,
29
+ )
30
+
31
+
32
+ # Create the FastAPI app
33
+ # Pass the factory function instead of an instance for WebSocket session support
34
+ app = create_app(
35
+ create_browsergym_environment,
36
+ BrowserGymAction,
37
+ BrowserGymObservation,
38
+ env_name="browsergym_env",
39
+ )
40
+
41
+
42
+ def main():
43
+ """Main entry point for running the server."""
44
+ import uvicorn
45
+
46
+ uvicorn.run(app, host="0.0.0.0", port=port)
47
+
48
+
49
+ if __name__ == "__main__":
50
+ main()
server/browsergym_environment.py ADDED
@@ -0,0 +1,375 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """BrowserGym Environment implementation for OpenEnv.
2
+
3
+ This module wraps the BrowserGym framework to provide a compatible interface
4
+ with OpenEnv's Environment ABC. BrowserGym includes multiple benchmarks:
5
+ - MiniWoB++: Training environment with 100+ simple web tasks
6
+ - WebArena: Realistic evaluation with 812 complex tasks
7
+ - VisualWebArena: Visual web navigation tasks
8
+ - WorkArena: Enterprise task automation
9
+ """
10
+
11
+ import importlib
12
+ import logging
13
+ from typing import Any, Dict, Optional
14
+ from uuid import uuid4
15
+
16
+ import gymnasium as gym
17
+
18
+ from openenv.core.env_server.interfaces import Environment
19
+ from browsergym_env.models import (
20
+ BrowserGymAction,
21
+ BrowserGymObservation,
22
+ BrowserGymState,
23
+ )
24
+
25
+ logger = logging.getLogger(__name__)
26
+
27
+
28
+ def _get_axtree_txt(obs: Dict[str, Any]) -> str:
29
+ """Extract accessibility tree text from BrowserGym observation.
30
+
31
+ BrowserGym returns raw `axtree_object` which needs to be converted to text
32
+ using the `flatten_axtree_to_str` utility function.
33
+ """
34
+ # If already processed as text, return directly
35
+ if "axtree_txt" in obs and obs["axtree_txt"]:
36
+ return obs["axtree_txt"]
37
+
38
+ # Try to convert from raw axtree_object
39
+ if "axtree_object" in obs and obs["axtree_object"]:
40
+ try:
41
+ from browsergym.utils.obs import flatten_axtree_to_str
42
+
43
+ return flatten_axtree_to_str(obs["axtree_object"])
44
+ except ImportError:
45
+ logger.warning("browsergym.utils.obs not available, cannot convert axtree_object to text")
46
+ except Exception as e:
47
+ logger.warning(f"Failed to convert axtree_object to text: {e}")
48
+
49
+ return ""
50
+
51
+
52
+ def _get_pruned_html(obs: Dict[str, Any]) -> str:
53
+ """Extract pruned HTML from BrowserGym observation.
54
+
55
+ BrowserGym returns raw `dom_object` which needs to be converted to text
56
+ and then pruned using the `flatten_dom_to_str` and `prune_html` utilities.
57
+ """
58
+ # If already processed as pruned_html, return directly
59
+ if "pruned_html" in obs and obs["pruned_html"]:
60
+ return obs["pruned_html"]
61
+
62
+ # Try to convert from raw dom_object
63
+ if "dom_object" in obs and obs["dom_object"]:
64
+ try:
65
+ from browsergym.utils.obs import flatten_dom_to_str, prune_html
66
+
67
+ dom_str = flatten_dom_to_str(obs["dom_object"])
68
+ return prune_html(dom_str)
69
+ except ImportError:
70
+ logger.warning("browsergym.utils.obs not available, cannot convert dom_object to pruned_html")
71
+ except Exception as e:
72
+ logger.warning(f"Failed to convert dom_object to pruned_html: {e}")
73
+
74
+ return ""
75
+
76
+
77
+ _MINIWOB_LOAD_HELP = (
78
+ "MiniWoB tasks require the MiniWoB HTML bundle to be served over HTTP. "
79
+ "The official BrowserGym Docker image handles this automatically by "
80
+ "serving the bundle on port 8888. For custom or non-Docker deployments, "
81
+ "clone the MiniWoB++ repository, start a static server inside "
82
+ "`miniwob-plusplus/miniwob/html` (e.g. `python -m http.server 8888`), and "
83
+ "set the MINIWOB_URL environment variable to the served base URL such as "
84
+ "`http://localhost:8888/miniwob/`."
85
+ )
86
+
87
+
88
+ class BrowserGymEnvironment(Environment):
89
+ """BrowserGym environment wrapper for OpenEnv.
90
+
91
+ This environment wraps BrowserGym's Gymnasium-compatible environments to
92
+ provide unified access to multiple web navigation benchmarks.
93
+ """
94
+
95
+ def __init__(
96
+ self,
97
+ benchmark: str = "miniwob",
98
+ task_name: Optional[str] = None,
99
+ headless: bool = True,
100
+ viewport_width: int = 1280,
101
+ viewport_height: int = 720,
102
+ timeout: float = 10000.0,
103
+ **gym_kwargs: Any,
104
+ ):
105
+ """Initialize the BrowserGym environment.
106
+
107
+ Args:
108
+ benchmark: Benchmark to use ('miniwob', 'webarena', 'visualwebarena', etc.)
109
+ task_name: Specific task within the benchmark (e.g., 'click-test', 'click-button')
110
+ If None, will use first available task
111
+ headless: Whether to run browser in headless mode
112
+ viewport_width: Browser viewport width
113
+ viewport_height: Browser viewport height
114
+ timeout: Action timeout in milliseconds
115
+ **gym_kwargs: Additional arguments passed to gym.make()
116
+ """
117
+ super().__init__()
118
+
119
+ self.benchmark = benchmark
120
+ self.task_name = task_name
121
+ self.headless = headless
122
+ self.viewport_width = viewport_width
123
+ self.viewport_height = viewport_height
124
+ self.timeout = timeout
125
+ self.gym_kwargs = dict(gym_kwargs)
126
+
127
+ # Build environment ID
128
+ if task_name:
129
+ self.env_id = f"browsergym/{benchmark}.{task_name}"
130
+ else:
131
+ self.env_id = f"browsergym/{benchmark}"
132
+
133
+ # force import the benchmark module
134
+ benchmark_modules = {
135
+ "miniwob": "browsergym.miniwob",
136
+ "webarena": "browsergym.webarena",
137
+ "visualwebarena": "browsergym.visualwebarena",
138
+ "workarena": "browsergym.workarena",
139
+ }
140
+ module_path = benchmark_modules.get(benchmark)
141
+ try:
142
+ if module_path:
143
+ importlib.import_module(module_path)
144
+ else:
145
+ importlib.import_module("browsergym")
146
+ except ModuleNotFoundError as import_error:
147
+ message = (
148
+ "Failed to import BrowserGym benchmark "
149
+ f"'{benchmark}': {import_error}\n"
150
+ "Install the matching browsergym package "
151
+ f"(e.g., browsergym-{benchmark})."
152
+ )
153
+ raise ValueError(message) from import_error
154
+
155
+ # Create the BrowserGym environment
156
+ try:
157
+ self.gym_env = gym.make(
158
+ self.env_id,
159
+ headless=headless,
160
+ viewport={"width": viewport_width, "height": viewport_height},
161
+ timeout=timeout,
162
+ **self.gym_kwargs,
163
+ )
164
+ except Exception as e: # noqa: BLE001 - gym.make
165
+ message = (
166
+ "Failed to create BrowserGym environment "
167
+ f"'{self.env_id}': {e}\n"
168
+ "Make sure the benchmark package is installed "
169
+ f"(e.g., pip install browsergym-{benchmark})."
170
+ )
171
+ raise ValueError(message) from e
172
+
173
+ # State tracking
174
+ self._state = BrowserGymState(
175
+ episode_id=str(uuid4()),
176
+ step_count=0,
177
+ benchmark=benchmark,
178
+ task_name=task_name or "",
179
+ )
180
+
181
+ self._last_obs: Optional[Dict[str, Any]] = None
182
+ self._last_info: Optional[Dict[str, Any]] = None
183
+
184
+ def reset(
185
+ self,
186
+ seed: Optional[int] = None,
187
+ task_name: Optional[str] = None,
188
+ ) -> BrowserGymObservation:
189
+ """Reset the environment with a specific task.
190
+
191
+ Args:
192
+ seed: Random seed for reproducibility
193
+ task_name: Override task name for this episode
194
+
195
+ Returns:
196
+ Initial observation for the task
197
+ """
198
+ # Generate new episode ID
199
+ self._state = BrowserGymState(
200
+ episode_id=str(uuid4()),
201
+ step_count=0,
202
+ benchmark=self.benchmark,
203
+ task_name=task_name or self.task_name or "",
204
+ )
205
+
206
+ # Reset options
207
+ reset_options = {}
208
+ if seed is not None:
209
+ reset_options["seed"] = seed
210
+
211
+ # Reset the gym environment
212
+ try:
213
+ obs, info = self.gym_env.reset(**reset_options)
214
+ except AttributeError as err:
215
+ if "context" in str(err) and hasattr(self.gym_env, "close"):
216
+ # BrowserGym can leave partially initialized state after a
217
+ # failed reset. Close the hanging resources and try once more.
218
+ self.gym_env.close()
219
+ obs, info = self.gym_env.reset(**reset_options)
220
+ else:
221
+ raise
222
+ except Exception as err: # noqa: BLE001 - browsergym
223
+ message = str(err)
224
+ if self.benchmark == "miniwob" and "core is not defined" in message:
225
+ raise ValueError(_MINIWOB_LOAD_HELP) from err
226
+ raise
227
+
228
+ self._last_obs = obs
229
+ self._last_info = info
230
+
231
+ # Extract observation details
232
+ return self._create_observation(obs, info, done=False, reward=0.0)
233
+
234
+ def step(self, action: BrowserGymAction) -> BrowserGymObservation:
235
+ """Execute an action in the environment.
236
+
237
+ Args:
238
+ action: The action to execute
239
+
240
+ Returns:
241
+ Observation after executing the action
242
+ """
243
+ self._state.step_count += 1
244
+
245
+ # Execute action in gym environment
246
+ try:
247
+ obs, reward, terminated, truncated, info = self.gym_env.step(action.action_str)
248
+
249
+ self._last_obs = obs
250
+ self._last_info = info
251
+
252
+ # Update state
253
+ done = terminated or truncated
254
+ self._state.cum_reward += float(reward)
255
+
256
+ # Extract goal from info if available
257
+ if "goal" in info:
258
+ self._state.goal = str(info["goal"])
259
+
260
+ return self._create_observation(obs, info, done=done, reward=float(reward))
261
+
262
+ except Exception as e:
263
+ # Handle action execution errors
264
+ error_msg = str(e)
265
+ return BrowserGymObservation(
266
+ text=self._last_obs.get("text", "") if self._last_obs else "",
267
+ url=self._last_obs.get("url", "") if self._last_obs else "",
268
+ goal=self._state.goal,
269
+ error=error_msg,
270
+ last_action_error=True,
271
+ done=False,
272
+ reward=0.0,
273
+ )
274
+
275
+ def _create_observation(
276
+ self,
277
+ obs: Dict[str, Any],
278
+ info: Dict[str, Any],
279
+ done: bool,
280
+ reward: float,
281
+ ) -> BrowserGymObservation:
282
+ """Convert BrowserGym observation to OpenEnv format.
283
+
284
+ Args:
285
+ obs: BrowserGym observation dict
286
+ info: BrowserGym info dict
287
+ done: Whether episode is done
288
+ reward: Reward for the step
289
+
290
+ Returns:
291
+ BrowserGymObservation
292
+ """
293
+ # Generate text representations from raw BrowserGym objects
294
+ # BrowserGym returns axtree_object and dom_object which need conversion
295
+ axtree_txt = _get_axtree_txt(obs) if isinstance(obs, dict) else ""
296
+ pruned_html = _get_pruned_html(obs) if isinstance(obs, dict) else ""
297
+
298
+ # Extract text observation - prefer axtree_txt, fallback to pruned_html
299
+ text = axtree_txt or pruned_html
300
+ if not text and isinstance(obs, str):
301
+ text = obs
302
+
303
+ # Extract URL from obs (BrowserGym stores it there)
304
+ url = ""
305
+ if isinstance(obs, dict):
306
+ url = obs.get("url", "")
307
+
308
+ # Extract goal/instruction from goal_object or legacy goal field
309
+ goal = ""
310
+ if isinstance(obs, dict):
311
+ # New format: goal_object is a list of messages
312
+ goal_object = obs.get("goal_object", [])
313
+ if goal_object:
314
+ # Extract text content from goal messages
315
+ goal_texts = []
316
+ for msg in goal_object:
317
+ if isinstance(msg, dict):
318
+ content = msg.get("content", "")
319
+ if isinstance(content, str):
320
+ goal_texts.append(content)
321
+ elif isinstance(content, list):
322
+ for item in content:
323
+ if isinstance(item, dict) and item.get("type") == "text":
324
+ goal_texts.append(item.get("text", ""))
325
+ goal = " ".join(goal_texts)
326
+ # Fallback to legacy goal field
327
+ if not goal:
328
+ goal = obs.get("goal", "")
329
+
330
+ # Update state
331
+ self._state.current_url = url
332
+ self._state.goal = goal
333
+
334
+ # Extract additional observation modalities
335
+ screenshot = obs.get("screenshot") if isinstance(obs, dict) else None
336
+
337
+ # Extract last_action_error from obs (BrowserGym includes this)
338
+ last_action_error = False
339
+ if isinstance(obs, dict):
340
+ last_action_error = bool(obs.get("last_action_error"))
341
+
342
+ # Store full BrowserGym observation and info in metadata
343
+ # This preserves timestamps, additional fields, and any future extensions
344
+ # Note: We exclude large objects (dom_object, axtree_object) to reduce payload size
345
+ browsergym_metadata = {}
346
+ if isinstance(obs, dict):
347
+ # Include useful fields but exclude large raw objects
348
+ browsergym_metadata["browsergym_obs"] = {
349
+ k: v for k, v in obs.items() if k not in ("dom_object", "axtree_object", "screenshot")
350
+ }
351
+ browsergym_metadata["browsergym_info"] = info
352
+
353
+ return BrowserGymObservation(
354
+ text=text,
355
+ url=url,
356
+ screenshot=screenshot,
357
+ goal=goal,
358
+ axtree_txt=axtree_txt,
359
+ pruned_html=pruned_html,
360
+ error="",
361
+ last_action_error=last_action_error,
362
+ done=done,
363
+ reward=reward,
364
+ metadata=browsergym_metadata,
365
+ )
366
+
367
+ @property
368
+ def state(self) -> BrowserGymState:
369
+ """Get the current environment state."""
370
+ return self._state
371
+
372
+ def close(self) -> None:
373
+ """Clean up environment resources."""
374
+ if hasattr(self, "gym_env"):
375
+ self.gym_env.close()
server/requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ browsergym>=0.10.0
2
+ browsergym-core>=0.10.0
3
+ browsergym-miniwob>=0.10.0
4
+ browsergym-webarena>=0.10.0
5
+ gymnasium>=0.29.0
6
+ playwright>=1.40.0
7
+ Pillow>=10.0.0
8
+ beautifulsoup4>=4.12.0
9
+ fastapi>=0.104.0
10
+ uvicorn[standard]>=0.24.0
server/start.sh ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ MINIWOB_HTML_DIR=${MINIWOB_HTML_DIR:-/app/miniwob-plusplus/miniwob/html}
5
+ MINIWOB_HTTP_PORT=${MINIWOB_HTTP_PORT:-8888}
6
+ BROWSERGYM_PORT=${BROWSERGYM_PORT:-8000}
7
+
8
+ if [ ! -d "${MINIWOB_HTML_DIR}" ]; then
9
+ echo "MiniWoB HTML directory not found at ${MINIWOB_HTML_DIR}" >&2
10
+ exit 1
11
+ fi
12
+
13
+ python -m http.server "${MINIWOB_HTTP_PORT}" --bind 0.0.0.0 --directory "${MINIWOB_HTML_DIR}" &
14
+ HTTP_SERVER_PID=$!
15
+
16
+ sleep 1
17
+ if ! kill -0 "${HTTP_SERVER_PID}" 2>/dev/null; then
18
+ echo "Failed to start MiniWoB static server on port ${MINIWOB_HTTP_PORT}" >&2
19
+ exit 1
20
+ fi
21
+
22
+ cleanup() {
23
+ kill "${HTTP_SERVER_PID}" 2>/dev/null || true
24
+ }
25
+
26
+ trap cleanup EXIT INT TERM
27
+
28
+ exec python -m uvicorn browsergym_env.server.app:app --host 0.0.0.0 --port "${BROWSERGYM_PORT}"
29
+