computer-agent-v2 / README.md
jkorstad's picture
Deploy Computer Agent v2.0 full stack
31e5b3a verified
---
title: Computer Agent v2.0
emoji: πŸ€–
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: "5.0.0"
app_file: app.py
pinned: false
license: apache-2.0
short_description: "Enhanced universal computer agent with planner, MCP, memory & voice"
---
# πŸ€– Open Computer Agent v2.0
An **enhanced** universal computer-use agent built on [smolagents](https://github.com/huggingface/smolagents), [E2B Desktop](https://e2b.dev), and [Playwright](https://playwright.dev).
## What's New in v2.0
| Feature | Description |
|---------|-------------|
| 🧠 **Hierarchical Planner** | Breaks goals into subtasks before execution using a cheap text model |
| πŸ”Œ **Playwright MCP** | Semantic browser control (click by text/role, extract tables/links, evaluate JS) |
| 🎯 **Multi-Model Router** | Auto-selects the cheapest capable model (fast vision ↔ powerful vision ↔ fast text ↔ powerful text) |
| 🧩 **Set-of-Marks Vision** | Overlays numbered bounding boxes on UI elements for coordinate-free interaction |
| πŸ—„οΈ **Long-Term Memory** | ChromaDB vector store retrieves similar past tasks and proven strategies |
| πŸ” **Verifier Agent** | Checks subtask completion and triggers recovery loops |
| πŸ›‘ **Human-in-the-Loop** | Pauses on sensitive actions (payments, emails, deletes) for user approval |
| πŸŽ™οΈ **Voice I/O** | Speak tasks and hear responses via Whisper STT + Kokoro TTS |
| πŸ’° **Cost Dashboard** | Real-time $/task, token usage, and latency tracking |
| πŸ“Ή **Session Recording** | Saves every step as replayable macros with GIF/MP4 export potential |
| πŸ§ͺ **Enhanced Eval** | Built-in benchmark suite with LLM-as-a-Judge grading and A/B testing |
## Architecture
```
User Input (Text / Voice / File)
|
v
[Intelligence Router] ----> Planner (JSON DAG)
|
v
[Memory Retrieval] (ChromaDB)
|
v
[Plan Executor]
|
+---> [Browser Sub-Agent] (Playwright MCP)
+---> [Desktop Sub-Agent] (E2B + SoM Vision)
+---> [Coder Sub-Agent] (Code Interpreter)
+---> [HF Hub Sub-Agent] (Search / Upload)
|
v
[Verifier] -> Retry / Alternative / Continue
|
v
[Macro Saver] + Cost Report + Session Recording
```
## Quick Start
1. Set your **HF_TOKEN** and **E2B_API_KEY** in the Space Secrets.
2. Type a task (or speak it) and hit **πŸš€ Let's go!**.
3. Watch the agent plan, execute, verify, and report costs.
## Sensitive Actions
By default, the agent pauses before:
- Payments, purchases, subscriptions
- Sending emails/messages/posts
- Deleting files or uninstalling software
- Password/credit-card fields
Enable **Auto-approve all actions** in Advanced Options to disable HITL.
## Cost Budget
Default budget is **$2.00 USD per session**. The router automatically downgrades to cheaper models as the budget is consumed.
## Benchmarks
Run the built-in eval suite:
```python
from eval_harness import EvaluationHarness
# See eval_harness.py for usage
```
## Credits
- [smolagents](https://github.com/huggingface/smolagents) by Hugging Face
- [E2B](https://e2b.dev) for secure sandboxed desktops
- [Playwright](https://playwright.dev) for browser automation
- [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) for vision reasoning