Instructions to use Tzafon/Northstar-CUA-Fast with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Tzafon/Northstar-CUA-Fast with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Tzafon/Northstar-CUA-Fast")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Tzafon/Northstar-CUA-Fast")
model = AutoModelForMultimodalLM.from_pretrained("Tzafon/Northstar-CUA-Fast")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Tzafon/Northstar-CUA-Fast with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Tzafon/Northstar-CUA-Fast"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tzafon/Northstar-CUA-Fast",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Tzafon/Northstar-CUA-Fast

SGLang

How to use Tzafon/Northstar-CUA-Fast with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Tzafon/Northstar-CUA-Fast" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tzafon/Northstar-CUA-Fast",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Tzafon/Northstar-CUA-Fast" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tzafon/Northstar-CUA-Fast",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Tzafon/Northstar-CUA-Fast with Docker Model Runner:
```
docker model run hf.co/Tzafon/Northstar-CUA-Fast
```

Northstar CUA Fast

A 4B-parameter computer-use agent model trained with GUI reinforcement learning. Recovers from mistakes, generalizes across environments, and outperforms open-source models at twice its size on single-app tasks.

Built for agentic loops where every step is a model call.


Parameters	4B
Context	64K tokens
Training	GUI reinforcement learning
Input	Text + screenshot
Output	GUI actions (click, type, scroll, key, drag, ...)
Coordinates	0-999 normalized (model) / pixel-scaled (Responses API)
Pricing	< $1/M tokens

Highlights

RL-trained, not just SFT. Trained with GRPO on synthetic GUI environments, producing behaviors that generalize rather than memorize.
Recovery over raw accuracy. Multi-turn RL training teaches the model to detect failures from history and adapt, which matters far more than single-step click precision for long-horizon tasks.
Competitive at 4B. Matches or exceeds open-source models at 2x parameter count on single-app desktop tasks.
Production-ready API. OpenAI-compatible chat completions and a Responses API with pixel-scaled coordinates.

How It Was Trained

The Problem with SFT for Computer Use

Supervised fine-tuning on GUI data saturates after 100-1000 examples per task and degrades other abilities. More critically, SFT improvements do not generalize: the model memorizes state-action pairs rather than learning why an action should be taken. Coordinate prediction under SFT also suffers because all incorrect coordinates are penalized uniformly, so clicking 1 pixel away from ground truth is treated the same as clicking on the opposite side of the screen.

Reinforcement Learning on Synthetic Environments

Using a GRPO loss adapted for multi-modal inputs (built on prime-rl), Northstar CUA Fast was trained on synthetic GUI environments with bounding-box-based reward signals. Key findings:

Generalization from abstract environments. Training exclusively on simplified, fabricated test environments improved performance on real UI benchmarks (0.39 to 0.53 on an aggregated UI benchmark), surpassing SFT on actual UI datasets.
Multi-turn RL is critical. Training on ~100 environments requiring 3-15 click interactions produced a 20% absolute improvement on the OSWorld Chrome category, despite zero resemblance between training and evaluation environments.
Emergent self-correction. The model learns to detect failed interactions from its history and either retry with adjustments or try entirely different approaches. This cannot be systematically derived from SFT because it depends on the model's own action distribution.

Positional Encoding Insights

Analysis of the vision encoder revealed that absolute positional information decays exponentially through attention layers due to normalization. Since 2D-RoPE only encodes relative position, the additive patch embedding (added once at input) is the sole source of absolute coordinate information, and it degrades with depth. Scaling the positional embedding by 3x improved click accuracy from 40% to 80% on a simple red-ball benchmark without any retraining.

OSWorld Benchmark (pass@1, 50 steps)

Evaluated on OSWorld across 369 real-world desktop tasks.

Domain	UI-TARS 2	Qwen3 Flash	Northstar CUA Fast (4B)
Chrome	62.96%	56.43%	55.30%
Thunderbird	73.33%	66.67%	62.40%
LibreOffice Writer	60.87%	56.52%	56.94%
OS	41.67%	54.17%	46.26%
VLC	49.94%	34.41%	43.87%
Overall	53.1%	41.6%	37.01%

At 4B parameters, Northstar CUA Fast is competitive with open-source models at twice its size on single-app tasks. Using the EVOCUA agent harness: EVOCUA-8B averages 32.5% vs Northstar CUA Fast (RL) at 37.0%.

Why Recovery Matters More Than Accuracy

For multi-step agentic tasks, per-step accuracy requirements scale harshly:

Trajectory Length	50% success	80% success	95% success
1	0.50	0.80	0.95
4	0.84	0.95	0.99
16	0.96	0.99	1.00
32	0.98	0.99	1.00

Even with retry tolerance, the required per-step accuracy for long trajectories becomes impractical. The model's ability to recover from failures and handle out-of-distribution variation matters far more than raw single-step precision.

Supported Actions

click · double_click · triple_click · right_click · drag · type · key · scroll · hscroll · navigate (browser only) · wait · terminate

Quickstart

Install

pip install tzafon

Responses API (recommended)

import os
from tzafon import Lightcone

client = Lightcone(api_key=os.environ["TZAFON_API_KEY"])

response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    instructions="Click on the Firefox icon.",
    tools=[{
        "type": "computer_use",
        "display_width": 1024,
        "display_height": 768,
        "environment": "browser",
    }],
)
print(response.output)

OpenAI-compatible Chat Completions

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.tzafon.ai/v1",
)

response = client.chat.completions.create(
    model="tzafon.northstar-cua-fast",
    messages=[
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
            {"type": "text", "text": "Click on the Firefox icon."},
        ]},
    ],
    temperature=0,
    max_tokens=512,
)
print(response.choices[0].message.content)

cURL

curl -X POST https://api.tzafon.ai/v1/responses \
  -H "Authorization: Bearer $TZAFON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tzafon.northstar-cua-fast",
    "instructions": "Click on the Firefox icon.",
    "tools": [{"type": "computer_use", "display_width": 1024, "display_height": 768}]
  }'

Lightcone Agent Harness

Lightcone wraps Northstar CUA Fast into a full desktop automation loop: screenshot, think, act, repeat.

screenshot → Northstar CUA Fast → parse action → execute on computer → repeat

Features: pure-async FastAPI server with SSE streaming, sliding-window context management, Rust-accelerated image processing, and an auto-discovering tool registry.

git clone https://github.com/tzafon/lightcone.git
cd lightcone
uv venv && uv sync --extra dev
uv run maturin develop -m native/Cargo.toml

export TZAFON_API_KEY="your-api-key"
lightcone run --task "Open Firefox and search for 'hello world'"

What's Open Source vs Hosted

Component	License	Status
Lightcone agent harness	Apache 2.0	GitHub
Python SDK (`tzafon`)	MIT	PyPI
Model weights	Apache 2.0	Tzafon API

Citation

@misc{tzafon2026northstarcuafast,
    title={Northstar CUA Fast: Lightweight Computer-Use Agent Model},
    author={Tzafon Team},
    year={2026},
    url={https://github.com/tzafon/lightcone},
}

Model tree for Tzafon/Northstar-CUA-Fast

Adapters

1 model

Evaluation results

Overall (50 steps) on OSWorld
self-reported

37.010
Chrome on OSWorld
self-reported

55.300
Thunderbird on OSWorld
self-reported

62.400
LibreOffice Writer on OSWorld
self-reported

56.940
OS on OSWorld
self-reported

46.260
VLC on OSWorld
self-reported

43.870

Tzafon
/

Northstar-CUA-Fast