--- title: Computer Agent v2.0 emoji: ๐Ÿค– colorFrom: purple colorTo: blue sdk: gradio app_file: app.py pinned: false license: apache-2.0 short_description: "Computer agent with planner, multi-model router, MCP, memory" --- # ๐Ÿค– Open Computer Agent v2.0 An **enhanced** universal computer-use agent built on [smolagents](https://github.com/huggingface/smolagents), [E2B Desktop](https://e2b.dev), and [Playwright](https://playwright.dev). It plans before it acts, remembers what worked, routes tasks to the cheapest capable model, and verifies its own success. ## โœจ What's New in v2.0 | Feature | Description | |---------|-------------| | ๐Ÿง  **Hierarchical Planner** | Breaks goals into subtask DAGs using a cheap text model before execution | | ๐Ÿ”Œ **Playwright MCP** | Semantic browser control โ€” click by text/role, extract tables/links, evaluate JS | | ๐ŸŽฏ **Multi-Model Router** | Auto-selects the cheapest capable model (fast vision โ†” powerful vision โ†” fast text โ†” powerful text) | | ๐Ÿงฉ **Set-of-Marks Vision** | Overlays numbered bounding boxes on UI elements for coordinate-free interaction | | ๐Ÿ—„๏ธ **Long-Term Memory** | ChromaDB vector store retrieves similar past tasks and proven strategies | | ๐Ÿ” **Verifier Agent** | Checks subtask completion and triggers recovery loops automatically | | ๐Ÿ›‘ **Human-in-the-Loop** | Pauses on sensitive actions (payments, emails, deletes) for user approval | | ๐ŸŽ™๏ธ **Voice I/O** | Speak tasks and hear responses via Whisper STT + Kokoro TTS | | ๐Ÿ’ฐ **Cost Dashboard** | Real-time $/task, token usage, and latency tracking | | ๐Ÿ“น **Session Recording** | Saves every step as replayable macros with full trace export | | ๐Ÿงช **Enhanced Eval** | Built-in benchmark suite with LLM-as-a-Judge grading and A/B testing | ## ๐Ÿ—๏ธ Architecture ``` User Input (Text / Voice / File) | v [IntelligenceRouter] ----> Planner (JSON DAG) | v [Memory Retrieval] (ChromaDB) | v [Plan Executor] | +---> [Browser Sub-Agent] (Playwright MCP) +---> [Desktop Sub-Agent] (E2B + SoM Vision) +---> [Coder Sub-Agent] (Code Interpreter) +---> [HF Hub Sub-Agent] (Search / Upload) | v [Verifier] -> Retry / Alternative / Continue | v [Macro Saver] + Cost Report + Session Recording ``` ## ๐Ÿš€ Quick Start ### 1. Secrets Setup Go to **Space Settings โ†’ Secrets** and add: | Secret Name | Value | Required? | |-------------|-------|-----------| | `E2B_API_KEY` | Your key from [e2b.dev](https://e2b.dev) | **Yes** for desktop automation | | `HF_TOKEN` | Your Hugging Face token | **Yes** for model inference & Hub tools | Then **Factory Rebuild** the Space. ### 2. Run a Task 1. Type a task (or click ๐ŸŽ™๏ธ to speak it) 2. Hit **๐Ÿš€ Let's go!** 3. Watch the agent: - ๐Ÿง  Generate a plan in the left panel - ๐Ÿ–ฅ๏ธ Control the sandbox desktop in real time - ๐Ÿ’ฐ Update cost tracking live - โœ… Verify completion at the end ## ๐Ÿ›ก๏ธ Sensitive Actions By default, the agent pauses before: - Payments, purchases, subscriptions - Sending emails/messages/posts - Deleting files or uninstalling software - Password/credit-card fields Enable **Auto-approve all actions** in โš™๏ธ Advanced Options to disable HITL. ## ๐Ÿ’ฐ Cost Budget Default budget is **$2.00 USD per session**. The router automatically downgrades to cheaper models as the budget is consumed. Costs are estimated from token counts and model pricing โ€” actual HF Inference API costs may vary. ## ๐Ÿงช Running Benchmarks ```python from eval_harness import EvaluationHarness, DEFAULT_BENCHMARKS from app import build_session_components # Create harness with a factory that builds agents harness = EvaluationHarness( agent_factory=lambda: build_session_components("eval_session", "./tmp/eval")["router"], judge_model_call=lambda msgs: build_session_components("eval_session", "./tmp/eval")["router"](msgs).content, ) # Run full suite summary = harness.run_suite(DEFAULT_BENCHMARKS, num_runs=1) print(f"Pass rate: {summary.passed}/{summary.total_tasks}") print(f"Avg score: {summary.avg_score}") ``` Or run a quick A/B test between two configurations: ```python results = harness.compare_strategies( strategy_a_factory=make_agent_v1, strategy_b_factory=make_agent_v2, num_runs=3, ) print(f"Winner: Strategy {results['winner']}") ``` ## ๐ŸŽ™๏ธ Voice Input 1. Click the **microphone** icon next to the task box 2. Speak your task clearly 3. The transcribed text appears in the task box automatically 4. Hit **Run** Voice requires `faster-whisper` (optional dependency). If unavailable, a text fallback is provided. ## ๐Ÿงฉ MCP Tools Reference | Tool | Description | |------|-------------| | `browser_goto(url)` | Navigate browser to URL | | `browser_click(selector, by)` | Click by CSS/text/role | | `browser_fill(selector, text)` | Fill form fields | | `browser_find_and_click(text)` | Click by visible text | | `browser_extract_links()` | Get all page links as JSON | | `browser_extract_tables()` | Get all page tables as JSON | | `browser_evaluate_js(script)` | Run JS in browser context | | `hf_search_models(query)` | Search HF Hub for models | | `hf_search_datasets(query)` | Search HF Hub for datasets | | `hf_upload_dataset_file(...)` | Upload a file to a HF dataset | | `fs_read(path)` | Read a workspace file | | `fs_write(path, content)` | Write a workspace file | ## ๐Ÿ“ Project Structure ``` โ”œโ”€โ”€ app.py # Gradio UI + event orchestration โ”œโ”€โ”€ core_agent.py # Router, Planner, Verifier, Memory, SoM, Recorder โ”œโ”€โ”€ mcp_tools.py # Playwright, CodeExec, FileSystem, HF Hub bridges โ”œโ”€โ”€ voice_interface.py # STT + TTS with WebGPU detection โ”œโ”€โ”€ eval_harness.py # Benchmarks + LLM-as-a-Judge + A/B testing โ”œโ”€โ”€ e2bqwen.py # Original E2B vision agent (preserved) โ”œโ”€โ”€ requirements.txt โ””โ”€โ”€ README.md ``` ## ๐Ÿค Credits - [smolagents](https://github.com/huggingface/smolagents) by Hugging Face - [E2B](https://e2b.dev) for secure sandboxed desktops - [Playwright](https://playwright.dev) for browser automation - [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) for vision reasoning - [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for TTS ## ๐Ÿ“„ License Apache 2.0