v2.3 — Open Source

Your AI Agent
for the Browser

Type what you want in natural language. Crab-Agent navigates pages, clicks elements, fills forms, reads content, manages tabs — all autonomously via Chrome DevTools Protocol. Bring your own API key. Works with any LLM provider.

Click & Type
Navigate
Read Pages
Tabs
Schedule
Works with your favorite LLM provider
A
Anthropic
O
OpenAI
G
Google Gemini
R
OpenRouter
O
Ollama (Local)
+
OpenAI-compatible

Everything you need to automate the web

A complete agent toolkit built into a Chrome side panel. No coding required.

Vision + Native AX Tree

Uses Chromium's native Accessibility tree with ref IDs (covers Shadow DOM, cross-origin iframes, custom elements). Pixel-perfect coordinates from the renderer.

Full Browser Control via CDP

Click, type, scroll, drag, navigate, open tabs, fill forms, upload files, execute JavaScript — hardware-level events through Chrome DevTools Protocol, not synthetic JS.

Native Tool Calling

Uses each provider's native tool-use API (Anthropic tool_use, OpenAI function calling, Gemini function_declarations). Structured responses, not text JSON parsing.

GIF Replay Recording

Record every task replay as GIF/HTML/JSON. Review what the agent did, debug failed flows, or share runs as visual artifacts.

Task Scheduler

Schedule tasks for the future — one-time or recurring. Natural language time parsing via LLM. The agent runs them automatically via Chrome alarms.

Domain Permission System

Domain-based permissions keep you in control. Smart message compaction with progressive token budgeting keeps long sessions stable.

From natural language to action

A tool-use agent loop that keeps going until the task is done.

1

You describe the task

Open the side panel and type what you want in plain language. Attach screenshots if needed. "Book the cheapest flight to Tokyo for next Friday"

2

Agent observes the page

Crab-Agent takes a screenshot and pulls the native CDP Accessibility tree, mapping every interactive element to a ref ID (e.g., ref_42) with pixel-perfect coordinates.

3

LLM decides the next action

The conversation (including visual context) is sent to your chosen LLM via native tool-calling APIs. It selects a tool — click, type, navigate, read — and the extension executes it via CDP.

4

Repeat until done

The result is appended to the conversation and the loop continues. State manager handles loop detection. The agent handles multi-step flows, tab switching, and error recovery automatically.

A tool for every browser interaction

22 external + 2 internal. The agent picks the right tool for each step.

computer (13 actions)
navigate
read_page (AX tree)
find (semantic)
form_input
get_page_text
tabs_context
tabs_create
switch_tab
close_tab
javascript_tool
file_upload
upload_image
document_generator
gif_creator
canvas_toolkit
visualize (charts)
code_editor
set_of_mark
resize_window
shortcuts_list
shortcuts_execute
read_console_messages
read_network_requests
update_plan
ask_user / done

Built for reliability

React 18 + TypeScript + Vite. Chrome MV3 service worker. Multi-provider LLM gateway.

Side Panel (React 18 + Zustand) Chat | Workflows | Schedule | Settings | Port messages | Background Service Worker Session management | Tab groups | Alarms | Scheduler | Agent Loop (tool-use cycle) Screenshot -> AX Tree -> Call LLM -> Execute Tool -> Repeat | | LLM Gateway Tool Executors (CDP) Anthropic (tool_use) Hardware mouse/keyboard OpenAI (function calling) Browser: tabs, navigate Gemini (function_declarations) Page: read, find, JS OpenRouter Files: upload, download Ollama (text JSON) Docs, GIF, canvas OpenAI-compatible (auto-detect) Permissions, scheduler

Ready to automate your browser?

Free to use. Bring your own API key and let Clawd handle the rest.

Standing on the shoulders of giants

Crab-Agent wouldn't exist without these amazing projects and their creators.

Clawd Tank

Assets & Mascot

The adorable "Clawd" crab pixel-art mascot and SVG animations used throughout Crab-Agent are derived from the Clawd Tank project by Marcio Granzotto. Thank you for the amazing crab character!

Claude for Chrome (Anthropic)

Agent Logic

Core agent loop architecture and browser automation logic inspired by Anthropic's Claude for Chrome. The tool-use cycle pattern — screenshot, observe, decide, act — draws heavily from their pioneering work on AI browser agents.