Northstar CUA Fast

A 4B-parameter computer-use agent model trained with GUI reinforcement learning. Recovers from mistakes, generalizes across environments, and outperforms open-source models at twice its size on single-app tasks.

Built for agentic loops where every step is a model call.

Parameters 4B
Context 64K tokens
Training GUI reinforcement learning
Input Text + screenshot
Output GUI actions (click, type, scroll, key, drag, ...)
Coordinates 0-999 normalized (model) / pixel-scaled (Responses API)
Pricing < $1/M tokens

Highlights

  • RL-trained, not just SFT. Trained with GRPO on synthetic GUI environments, producing behaviors that generalize rather than memorize.
  • Recovery over raw accuracy. Multi-turn RL training teaches the model to detect failures from history and adapt, which matters far more than single-step click precision for long-horizon tasks.
  • Competitive at 4B. Matches or exceeds open-source models at 2x parameter count on single-app desktop tasks.
  • Production-ready API. OpenAI-compatible chat completions and a Responses API with pixel-scaled coordinates.

How It Was Trained

The Problem with SFT for Computer Use

Supervised fine-tuning on GUI data saturates after 100-1000 examples per task and degrades other abilities. More critically, SFT improvements do not generalize: the model memorizes state-action pairs rather than learning why an action should be taken. Coordinate prediction under SFT also suffers because all incorrect coordinates are penalized uniformly, so clicking 1 pixel away from ground truth is treated the same as clicking on the opposite side of the screen.

Reinforcement Learning on Synthetic Environments

Using a GRPO loss adapted for multi-modal inputs (built on prime-rl), Northstar CUA Fast was trained on synthetic GUI environments with bounding-box-based reward signals. Key findings:

  • Generalization from abstract environments. Training exclusively on simplified, fabricated test environments improved performance on real UI benchmarks (0.39 to 0.53 on an aggregated UI benchmark), surpassing SFT on actual UI datasets.
  • Multi-turn RL is critical. Training on ~100 environments requiring 3-15 click interactions produced a 20% absolute improvement on the OSWorld Chrome category, despite zero resemblance between training and evaluation environments.
  • Emergent self-correction. The model learns to detect failed interactions from its history and either retry with adjustments or try entirely different approaches. This cannot be systematically derived from SFT because it depends on the model's own action distribution.

Positional Encoding Insights

Analysis of the vision encoder revealed that absolute positional information decays exponentially through attention layers due to normalization. Since 2D-RoPE only encodes relative position, the additive patch embedding (added once at input) is the sole source of absolute coordinate information, and it degrades with depth. Scaling the positional embedding by 3x improved click accuracy from 40% to 80% on a simple red-ball benchmark without any retraining.

OSWorld Benchmark (pass@1, 50 steps)

Evaluated on OSWorld across 369 real-world desktop tasks.

Domain UI-TARS 2 Qwen3 Flash Northstar CUA Fast (4B)
Chrome 62.96% 56.43% 55.30%
Thunderbird 73.33% 66.67% 62.40%
LibreOffice Writer 60.87% 56.52% 56.94%
OS 41.67% 54.17% 46.26%
VLC 49.94% 34.41% 43.87%
Overall 53.1% 41.6% 37.01%

At 4B parameters, Northstar CUA Fast is competitive with open-source models at twice its size on single-app tasks. Using the EVOCUA agent harness: EVOCUA-8B averages 32.5% vs Northstar CUA Fast (RL) at 37.0%.

Why Recovery Matters More Than Accuracy

For multi-step agentic tasks, per-step accuracy requirements scale harshly:

Trajectory Length 50% success 80% success 95% success
1 0.50 0.80 0.95
4 0.84 0.95 0.99
16 0.96 0.99 1.00
32 0.98 0.99 1.00

Even with retry tolerance, the required per-step accuracy for long trajectories becomes impractical. The model's ability to recover from failures and handle out-of-distribution variation matters far more than raw single-step precision.

Supported Actions

click · double_click · triple_click · right_click · drag · type · key · scroll · hscroll · navigate (browser only) · wait · terminate

Quickstart

Install

pip install tzafon

Responses API (recommended)

import os
from tzafon import Lightcone

client = Lightcone(api_key=os.environ["TZAFON_API_KEY"])

response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    instructions="Click on the Firefox icon.",
    tools=[{
        "type": "computer_use",
        "display_width": 1024,
        "display_height": 768,
        "environment": "browser",
    }],
)
print(response.output)

OpenAI-compatible Chat Completions

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.tzafon.ai/v1",
)

response = client.chat.completions.create(
    model="tzafon.northstar-cua-fast",
    messages=[
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
            {"type": "text", "text": "Click on the Firefox icon."},
        ]},
    ],
    temperature=0,
    max_tokens=512,
)
print(response.choices[0].message.content)

cURL

curl -X POST https://api.tzafon.ai/v1/responses \
  -H "Authorization: Bearer $TZAFON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tzafon.northstar-cua-fast",
    "instructions": "Click on the Firefox icon.",
    "tools": [{"type": "computer_use", "display_width": 1024, "display_height": 768}]
  }'

Lightcone Agent Harness

Lightcone wraps Northstar CUA Fast into a full desktop automation loop: screenshot, think, act, repeat.

screenshot → Northstar CUA Fast → parse action → execute on computer → repeat

Features: pure-async FastAPI server with SSE streaming, sliding-window context management, Rust-accelerated image processing, and an auto-discovering tool registry.

git clone https://github.com/tzafon/lightcone.git
cd lightcone
uv venv && uv sync --extra dev
uv run maturin develop -m native/Cargo.toml

export TZAFON_API_KEY="your-api-key"
lightcone run --task "Open Firefox and search for 'hello world'"

What's Open Source vs Hosted

Component License Status
Lightcone agent harness Apache 2.0 GitHub
Python SDK (tzafon) MIT PyPI
Model weights Apache 2.0 Tzafon API

Citation

@misc{tzafon2026northstarcuafast,
    title={Northstar CUA Fast: Lightweight Computer-Use Agent Model},
    author={Tzafon Team},
    year={2026},
    url={https://github.com/tzafon/lightcone},
}

Links

Downloads last month
159
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results