computer-use-agent / README.md
A-Mahla
Amir/handle sandbox (#18)
8f4ea43 unverified
|
raw
history blame
6.7 kB
---
title: CUA2 - Computer Use Agent 2
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# CUA2 - Computer Use Agent 2
An AI-powered automation interface featuring real-time agent task processing, VNC streaming, and step-by-step execution visualization.
## πŸš€ Overview
CUA2 is a full-stack application that provides a modern web interface for AI agents to perform automated computer tasks. The system features real-time WebSocket communication between a FastAPI backend and React frontend, allowing users to monitor agent execution, view screenshots, track token usage, and stream VNC sessions.
## πŸ—οΈ Architecture
![CUA2 Architecture](assets/architecture.png)
## πŸ› οΈ Tech Stack
### Backend (`cua2-core`)
- **FastAPI**
- **Uvicorn**
- **smolagents** - AI agent framework with OpenAI/LiteLLM support
### Frontend (`cua2-front`)
- **React TS**
- **Vite**
## πŸ“‹ Prerequisites
- **Python** 3.10 or higher
- **Node.js** 18 or higher
- **npm**
- **uv** - Python package manager
### Installing uv
**macOS/Linux:**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
For more installation options, visit: https://docs.astral.sh/uv/getting-started/installation/
## πŸš€ Getting Started
### 1. Clone the Repository
```bash
git clone https://github.com/huggingface/CUA2.git
cd CUA2
```
### 2. Install Dependencies
Use the Makefile for quick setup:
```bash
make sync
```
This will:
- Install Python dependencies using `uv`
- Install Node.js dependencies for the frontend
Or install manually:
```bash
# Backend dependencies
cd cua2-core
uv sync --all-extras
# Frontend dependencies
cd ../cua2-front
npm install
```
### 3. Environment Configuration
Copy the example environment file and configure your settings:
```bash
cd cua2-core
cp env.example .env
```
Edit `.env` with your configuration:
- API keys for OpenAI/LiteLLM
- Database connections (if applicable)
- HuggingFace credentials for data archival (optional)
- Other service credentials
#### Data Archival Configuration (Optional)
CUA2 includes an automatic data archival feature that backs up old trace data to HuggingFace datasets:
```bash
# HuggingFace token for uploading archived data
HF_TOKEN=your_huggingface_token_here
# HuggingFace dataset repository ID (e.g., "username/dataset-name")
HF_DATASET_REPO=your_username/your_dataset_repo
# Check interval (default: 30 minutes)
ARCHIVE_INTERVAL_MINUTES=30
# Age threshold - folders older than this will be archived (default: 30 minutes)
FOLDER_AGE_THRESHOLD_MINUTES=30
```
**How it works:**
1. Every 30 minutes (configurable), the system checks the `data/` folder for trace folders
2. Folders older than 30 minutes (configurable) are compressed into `.tar.gz` archives
3. Archives are uploaded to your HuggingFace dataset repository
4. After verifying successful upload, local folders are deleted to free up space
5. This keeps your disk usage minimal while preserving all agent traces in the cloud
### 4. Start Development Servers
#### Option 1: Using Makefile (Recommended)
Open two terminal windows:
**Terminal 1 - Backend:**
```bash
make dev-backend
```
**Terminal 2 - Frontend:**
```bash
make dev-frontend
```
#### Option 2: Manual Start
**Terminal 1 - Backend:**
```bash
cd cua2-core
uv run uvicorn cua2_core.main:app --reload --host 0.0.0.0 --port 8000
```
**Terminal 2 - Frontend:**
```bash
cd cua2-front
npm run dev
```
### 5. Access the Application
- **Frontend**: http://localhost:8080
- **Backend API**: http://localhost:8000
- **API Documentation**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
## πŸ“ Project Structure
```
CUA2/
β”œβ”€β”€ cua2-core/ # Backend application
β”‚ β”œβ”€β”€ src/
β”‚ β”‚ └── cua2_core/
β”‚ β”‚ β”œβ”€β”€ app.py # FastAPI application setup
β”‚ β”‚ β”œβ”€β”€ main.py # Application entry point
β”‚ β”‚ β”œβ”€β”€ models/
β”‚ β”‚ β”‚ └── models.py # Pydantic models
β”‚ β”‚ β”œβ”€β”€ routes/
β”‚ β”‚ β”‚ β”œβ”€β”€ routes.py # REST API endpoints
β”‚ β”‚ β”‚ └── websocket.py # WebSocket endpoint
β”‚ β”‚ β”œβ”€β”€ services/
β”‚ β”‚ β”‚ β”œβ”€β”€ agent_service.py # Agent task processing
β”‚ β”‚ β”‚ └── simulation_metadata/ # Demo data
β”‚ β”‚ └── websocket/
β”‚ β”‚ └── websocket_manager.py # WebSocket management
β”‚ β”œβ”€β”€ pyproject.toml # Python dependencies
β”‚ └── env.example # Environment variables template
β”‚
β”œβ”€β”€ cua2-front/ # Frontend application
β”‚ β”œβ”€β”€ src/
β”‚ β”‚ β”œβ”€β”€ App.tsx # Main application component
β”‚ β”‚ β”œβ”€β”€ pages/
β”‚ β”‚ β”‚ └── Index.tsx # Main page
β”‚ β”‚ β”œβ”€β”€ components/
β”‚ β”‚ β”‚ └── mock/ # UI components
β”‚ β”‚ β”œβ”€β”€ hooks/
β”‚ β”‚ β”‚ └── useWebSocket.ts # WebSocket hook
β”‚ β”‚ └── types/
β”‚ β”‚ └── agent.ts # TypeScript type definitions
β”‚ β”œβ”€β”€ package.json # Node dependencies
β”‚ └── vite.config.ts # Vite configuration
β”‚
β”œβ”€β”€ Makefile # Development commands
└── README.md # This file
```
## πŸ”Œ API Endpoints
### REST API
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/health` | Health check with WebSocket connection count |
| GET | `/tasks` | Get all active tasks |
| GET | `/tasks/{task_id}` | Get specific task status |
| GET | `/docs` | Interactive API documentation (Swagger) |
| GET | `/redoc` | Alternative API documentation (ReDoc) |
### WebSocket
#### Client β†’ Server Events
- `user_task` - New user task request
#### Server β†’ Client Events
- `agent_start` - Agent begins processing
- `agent_progress` - New step completed with image and metadata
- `agent_complete` - Task finished successfully
- `agent_error` - Error occurred during processing
- `vnc_url_set` - VNC stream URL available
- `vnc_url_unset` - VNC stream ended
- `heartbeat` - Connection keep-alive
## πŸ§ͺ Development
### Available Make Commands
```bash
make sync # Sync all dependencies (Python + Node.js)
make dev-backend # Start backend development server
make dev-frontend # Start frontend development server
make pre-commit # Run pre-commit hooks
make clean # Clean build artifacts and caches
```
### Code Quality
```bash
# Backend
make pre-commit
```
**Happy Coding! πŸš€**