Dalaal Env

A reinforcement learning environment where LLM agents learn to navigate and interact with web pages through accessibility tree observations.

OpenEnv Framework Playwright + CDP 19 Tasks

Overview

{len(TASKS)}

Browser Tasks

Mock Websites

Action Types

Benchmark Sources

Architecture

LLM Agent
Qwen / GPT / etc.

→

Dalaal Env
FastAPI + OpenEnv

→

Browser
Playwright + Chromium

The agent observes a numbered accessibility tree (extracted via CDP) and emits structured actions (click, type, select, scroll, etc.). The environment executes actions in a headless browser and evaluates task-specific JavaScript success criteria.

API Endpoints

WS /ws — WebSocket for persistent sessions (primary)

POST /reset — Reset environment with a task

POST /step — Execute a browser action

GET /state — Get current observation

GET /tasks — List all available tasks (JSON)

GET /docs — Interactive API documentation (Swagger)

Action Space

Each action is a JSON object with action_type and relevant parameters:

Action	Parameters	Description
`click`	`element_id`	Click an element by its accessibility tree ID
`type`	`element_id`, `text`	Type text into an input field
`select`	`element_id`, `text`	Select a dropdown option by visible text
`press_key`	`key`	Press a keyboard key (Enter, Tab, etc.)
`scroll`	`direction`	Scroll the page (up/down)
`wait`	—	Wait for page to settle
`done`	—	Signal task completion

Available Tasks

{task_rows}

Task ID	Description	Mock Site	Max Steps

Reward Structure

+1.0 on task success | -0.01 per step penalty | Clamped to [0, 1]

Example: completing a task in 4 steps → reward = max(0, 1.0 - 0.04) = 0.96

Quick Start

Run inference against this environment:

API_BASE_URL=https://router.huggingface.co/v1 \\
MODEL_NAME=Qwen/Qwen3.5-27B \\
HF_TOKEN=hf_... \\
DALAAL_TASK=todo_add \\
python inference.py