--- title: Computer Agent v2.0 emoji: ๐Ÿค– colorFrom: purple colorTo: blue sdk: gradio sdk_version: "5.0.0" app_file: app.py pinned: false license: apache-2.0 short_description: "Enhanced universal computer agent with planner, MCP, memory & voice" --- # ๐Ÿค– Open Computer Agent v2.0 An **enhanced** universal computer-use agent built on [smolagents](https://github.com/huggingface/smolagents), [E2B Desktop](https://e2b.dev), and [Playwright](https://playwright.dev). ## What's New in v2.0 | Feature | Description | |---------|-------------| | ๐Ÿง  **Hierarchical Planner** | Breaks goals into subtasks before execution using a cheap text model | | ๐Ÿ”Œ **Playwright MCP** | Semantic browser control (click by text/role, extract tables/links, evaluate JS) | | ๐ŸŽฏ **Multi-Model Router** | Auto-selects the cheapest capable model (fast vision โ†” powerful vision โ†” fast text โ†” powerful text) | | ๐Ÿงฉ **Set-of-Marks Vision** | Overlays numbered bounding boxes on UI elements for coordinate-free interaction | | ๐Ÿ—„๏ธ **Long-Term Memory** | ChromaDB vector store retrieves similar past tasks and proven strategies | | ๐Ÿ” **Verifier Agent** | Checks subtask completion and triggers recovery loops | | ๐Ÿ›‘ **Human-in-the-Loop** | Pauses on sensitive actions (payments, emails, deletes) for user approval | | ๐ŸŽ™๏ธ **Voice I/O** | Speak tasks and hear responses via Whisper STT + Kokoro TTS | | ๐Ÿ’ฐ **Cost Dashboard** | Real-time $/task, token usage, and latency tracking | | ๐Ÿ“น **Session Recording** | Saves every step as replayable macros with GIF/MP4 export potential | | ๐Ÿงช **Enhanced Eval** | Built-in benchmark suite with LLM-as-a-Judge grading and A/B testing | ## Architecture ``` User Input (Text / Voice / File) | v [Intelligence Router] ----> Planner (JSON DAG) | v [Memory Retrieval] (ChromaDB) | v [Plan Executor] | +---> [Browser Sub-Agent] (Playwright MCP) +---> [Desktop Sub-Agent] (E2B + SoM Vision) +---> [Coder Sub-Agent] (Code Interpreter) +---> [HF Hub Sub-Agent] (Search / Upload) | v [Verifier] -> Retry / Alternative / Continue | v [Macro Saver] + Cost Report + Session Recording ``` ## Quick Start 1. Set your **HF_TOKEN** and **E2B_API_KEY** in the Space Secrets. 2. Type a task (or speak it) and hit **๐Ÿš€ Let's go!**. 3. Watch the agent plan, execute, verify, and report costs. ## Sensitive Actions By default, the agent pauses before: - Payments, purchases, subscriptions - Sending emails/messages/posts - Deleting files or uninstalling software - Password/credit-card fields Enable **Auto-approve all actions** in Advanced Options to disable HITL. ## Cost Budget Default budget is **$2.00 USD per session**. The router automatically downgrades to cheaper models as the budget is consumed. ## Benchmarks Run the built-in eval suite: ```python from eval_harness import EvaluationHarness # See eval_harness.py for usage ``` ## Credits - [smolagents](https://github.com/huggingface/smolagents) by Hugging Face - [E2B](https://e2b.dev) for secure sandboxed desktops - [Playwright](https://playwright.dev) for browser automation - [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) for vision reasoning