| --- |
| title: Computer Agent v2.0 |
| emoji: π€ |
| colorFrom: purple |
| colorTo: blue |
| sdk: gradio |
| sdk_version: "5.0.0" |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| short_description: "Enhanced universal computer agent with planner, MCP, memory & voice" |
| --- |
| |
| # π€ Open Computer Agent v2.0 |
|
|
| An **enhanced** universal computer-use agent built on [smolagents](https://github.com/huggingface/smolagents), [E2B Desktop](https://e2b.dev), and [Playwright](https://playwright.dev). |
|
|
| ## What's New in v2.0 |
|
|
| | Feature | Description | |
| |---------|-------------| |
| | π§ **Hierarchical Planner** | Breaks goals into subtasks before execution using a cheap text model | |
| | π **Playwright MCP** | Semantic browser control (click by text/role, extract tables/links, evaluate JS) | |
| | π― **Multi-Model Router** | Auto-selects the cheapest capable model (fast vision β powerful vision β fast text β powerful text) | |
| | π§© **Set-of-Marks Vision** | Overlays numbered bounding boxes on UI elements for coordinate-free interaction | |
| | ποΈ **Long-Term Memory** | ChromaDB vector store retrieves similar past tasks and proven strategies | |
| | π **Verifier Agent** | Checks subtask completion and triggers recovery loops | |
| | π **Human-in-the-Loop** | Pauses on sensitive actions (payments, emails, deletes) for user approval | |
| | ποΈ **Voice I/O** | Speak tasks and hear responses via Whisper STT + Kokoro TTS | |
| | π° **Cost Dashboard** | Real-time $/task, token usage, and latency tracking | |
| | πΉ **Session Recording** | Saves every step as replayable macros with GIF/MP4 export potential | |
| | π§ͺ **Enhanced Eval** | Built-in benchmark suite with LLM-as-a-Judge grading and A/B testing | |
|
|
| ## Architecture |
|
|
| ``` |
| User Input (Text / Voice / File) |
| | |
| v |
| [Intelligence Router] ----> Planner (JSON DAG) |
| | |
| v |
| [Memory Retrieval] (ChromaDB) |
| | |
| v |
| [Plan Executor] |
| | |
| +---> [Browser Sub-Agent] (Playwright MCP) |
| +---> [Desktop Sub-Agent] (E2B + SoM Vision) |
| +---> [Coder Sub-Agent] (Code Interpreter) |
| +---> [HF Hub Sub-Agent] (Search / Upload) |
| | |
| v |
| [Verifier] -> Retry / Alternative / Continue |
| | |
| v |
| [Macro Saver] + Cost Report + Session Recording |
| ``` |
|
|
| ## Quick Start |
|
|
| 1. Set your **HF_TOKEN** and **E2B_API_KEY** in the Space Secrets. |
| 2. Type a task (or speak it) and hit **π Let's go!**. |
| 3. Watch the agent plan, execute, verify, and report costs. |
| |
| ## Sensitive Actions |
| |
| By default, the agent pauses before: |
| - Payments, purchases, subscriptions |
| - Sending emails/messages/posts |
| - Deleting files or uninstalling software |
| - Password/credit-card fields |
| |
| Enable **Auto-approve all actions** in Advanced Options to disable HITL. |
| |
| ## Cost Budget |
| |
| Default budget is **$2.00 USD per session**. The router automatically downgrades to cheaper models as the budget is consumed. |
| |
| ## Benchmarks |
| |
| Run the built-in eval suite: |
| ```python |
| from eval_harness import EvaluationHarness |
| # See eval_harness.py for usage |
| ``` |
| |
| ## Credits |
| |
| - [smolagents](https://github.com/huggingface/smolagents) by Hugging Face |
| - [E2B](https://e2b.dev) for secure sandboxed desktops |
| - [Playwright](https://playwright.dev) for browser automation |
| - [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) for vision reasoning |
| |