computer-use-agent / README.md
A-Mahla
Amir/handle sandbox (#18)
8f4ea43 unverified
|
raw
history blame
6.7 kB
metadata
title: CUA2 - Computer Use Agent 2
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

CUA2 - Computer Use Agent 2

An AI-powered automation interface featuring real-time agent task processing, VNC streaming, and step-by-step execution visualization.

πŸš€ Overview

CUA2 is a full-stack application that provides a modern web interface for AI agents to perform automated computer tasks. The system features real-time WebSocket communication between a FastAPI backend and React frontend, allowing users to monitor agent execution, view screenshots, track token usage, and stream VNC sessions.

πŸ—οΈ Architecture

CUA2 Architecture

πŸ› οΈ Tech Stack

Backend (cua2-core)

  • FastAPI
  • Uvicorn
  • smolagents - AI agent framework with OpenAI/LiteLLM support

Frontend (cua2-front)

  • React TS
  • Vite

πŸ“‹ Prerequisites

  • Python 3.10 or higher
  • Node.js 18 or higher
  • npm
  • uv - Python package manager

Installing uv

macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

For more installation options, visit: https://docs.astral.sh/uv/getting-started/installation/

πŸš€ Getting Started

1. Clone the Repository

git clone https://github.com/huggingface/CUA2.git
cd CUA2

2. Install Dependencies

Use the Makefile for quick setup:

make sync

This will:

  • Install Python dependencies using uv
  • Install Node.js dependencies for the frontend

Or install manually:

# Backend dependencies
cd cua2-core
uv sync --all-extras

# Frontend dependencies
cd ../cua2-front
npm install

3. Environment Configuration

Copy the example environment file and configure your settings:

cd cua2-core
cp env.example .env

Edit .env with your configuration:

  • API keys for OpenAI/LiteLLM
  • Database connections (if applicable)
  • HuggingFace credentials for data archival (optional)
  • Other service credentials

Data Archival Configuration (Optional)

CUA2 includes an automatic data archival feature that backs up old trace data to HuggingFace datasets:

# HuggingFace token for uploading archived data
HF_TOKEN=your_huggingface_token_here

# HuggingFace dataset repository ID (e.g., "username/dataset-name")
HF_DATASET_REPO=your_username/your_dataset_repo

# Check interval (default: 30 minutes)
ARCHIVE_INTERVAL_MINUTES=30

# Age threshold - folders older than this will be archived (default: 30 minutes)
FOLDER_AGE_THRESHOLD_MINUTES=30

How it works:

  1. Every 30 minutes (configurable), the system checks the data/ folder for trace folders
  2. Folders older than 30 minutes (configurable) are compressed into .tar.gz archives
  3. Archives are uploaded to your HuggingFace dataset repository
  4. After verifying successful upload, local folders are deleted to free up space
  5. This keeps your disk usage minimal while preserving all agent traces in the cloud

4. Start Development Servers

Option 1: Using Makefile (Recommended)

Open two terminal windows:

Terminal 1 - Backend:

make dev-backend

Terminal 2 - Frontend:

make dev-frontend

Option 2: Manual Start

Terminal 1 - Backend:

cd cua2-core
uv run uvicorn cua2_core.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend:

cd cua2-front
npm run dev

5. Access the Application

πŸ“ Project Structure

CUA2/
β”œβ”€β”€ cua2-core/                      # Backend application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── cua2_core/
β”‚   β”‚       β”œβ”€β”€ app.py              # FastAPI application setup
β”‚   β”‚       β”œβ”€β”€ main.py             # Application entry point
β”‚   β”‚       β”œβ”€β”€ models/
β”‚   β”‚       β”‚   └── models.py       # Pydantic models
β”‚   β”‚       β”œβ”€β”€ routes/
β”‚   β”‚       β”‚   β”œβ”€β”€ routes.py       # REST API endpoints
β”‚   β”‚       β”‚   └── websocket.py    # WebSocket endpoint
β”‚   β”‚       β”œβ”€β”€ services/
β”‚   β”‚       β”‚   β”œβ”€β”€ agent_service.py # Agent task processing
β”‚   β”‚       β”‚   └── simulation_metadata/ # Demo data
β”‚   β”‚       └── websocket/
β”‚   β”‚           └── websocket_manager.py # WebSocket management
β”‚   β”œβ”€β”€ pyproject.toml              # Python dependencies
β”‚   └── env.example                 # Environment variables template
β”‚
β”œβ”€β”€ cua2-front/                     # Frontend application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.tsx                 # Main application component
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   └── Index.tsx           # Main page
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   └── mock/               # UI components
β”‚   β”‚   β”œβ”€β”€ hooks/
β”‚   β”‚   β”‚   └── useWebSocket.ts     # WebSocket hook
β”‚   β”‚   └── types/
β”‚   β”‚       └── agent.ts            # TypeScript type definitions
β”‚   β”œβ”€β”€ package.json                # Node dependencies
β”‚   └── vite.config.ts              # Vite configuration
β”‚
β”œβ”€β”€ Makefile                        # Development commands
└── README.md                       # This file

πŸ”Œ API Endpoints

REST API

Method Endpoint Description
GET /health Health check with WebSocket connection count
GET /tasks Get all active tasks
GET /tasks/{task_id} Get specific task status
GET /docs Interactive API documentation (Swagger)
GET /redoc Alternative API documentation (ReDoc)

WebSocket

Client β†’ Server Events

  • user_task - New user task request

Server β†’ Client Events

  • agent_start - Agent begins processing
  • agent_progress - New step completed with image and metadata
  • agent_complete - Task finished successfully
  • agent_error - Error occurred during processing
  • vnc_url_set - VNC stream URL available
  • vnc_url_unset - VNC stream ended
  • heartbeat - Connection keep-alive

πŸ§ͺ Development

Available Make Commands

make sync              # Sync all dependencies (Python + Node.js)
make dev-backend       # Start backend development server
make dev-frontend      # Start frontend development server
make pre-commit        # Run pre-commit hooks
make clean             # Clean build artifacts and caches

Code Quality

# Backend
make pre-commit

Happy Coding! πŸš€