GLADOS-1 / README.md

0xnirmal

Update README.md

05b6b99 verified 3 months ago

preview code

raw

history blame contribute delete

2.23 kB

metadata

library_name: transformers
tags:
  - multimodal
  - gui
license: apache-2.0
datasets:
  - chakra-labs/pango
  - chakra-labs/pango-sample
language:
  - en
base_model:
  - ByteDance-Seed/UI-TARS-7B-SFT
pipeline_tag: image-text-to-text

GLADOS-1 — UI-TARS-7B-SFT

Model Description

GLADOS-1 is the first computer-use (CUA) model post-trained using collective, crowd-sourced trajectories. Leveraging the enourmous PANGO dataset (with primarily Chrome based interactions), it's purpose is to provide a lense as to what's possible with enormous trajectory sizes in computer use.

It also represents the first open-sourced post-training pipeline for UI-TARS, inspired by the existing Qwen2VL finetuning series.

This model is designed to:

Be compliant. It has been taught to rigorouly follow directions and output action formats compatible with downstream parsers like PyAutoGUI.
Understand web productivity applications. The Pango dataset primarily contains productivity application usage in browser. Consequently in OSWorld results, we observe significantly improved performance on the Chrome task bench.
Have strong intuition on visual grounding. Our experiments are detailed more closely here in our research blog.

📕 Release Blog | 🤗 Code | 🔧 Deployment (via UI-TARS) | 🖥️ Running on your own computer (via UI-TARS Desktop)

Citation

@misc{chakralabs2025glados-1,
  author = {Chakra Labs},
  title = {GLADOS-1},
  url = {https://github.com/Chakra-Network/GLADOS-1},
  year = {2025}
}