MobiZen-GUI-4B

🌐 Project | 💻 Demo | 📄 Chinese Trajectory Data

English | 简体中文

Introduction

MobiZen-GUI-4B is a native GUI agent model built on Qwen3-VL. It is trained on a large, hand-curated corpus of Chinese mobile GUI interactions, the model has learned from hundreds of thousands of real Chinese app sessions spanning e-commerce, transport, social, and finance. Each record includes screenshots, touch traces, and Chinese instructions, giving the agent deep insight into Chinese UI conventions and workflows.

The goal of MobiZen-GUI-4B is to make it easier—and faster—to build and ship Chinese Mobile GUI agents. It delivers:

  • A 4-billion-parameter agent that runs completely on your own desktop or laptop.
  • Fast execution speed, relying only on a single image and historical actions. It relies solely on a single current image and historical actions, requiring no additional information, resulting in fast execution speed.
  • A turnkey inference kit that auto-handles ADB links and pulls in every required library.

What it can do

  • Runs on everyday machines: engineered for snappy response while keeping data on-device.
  • Sees and acts: spots buttons, text fields, lists, etc., then taps, types, swipes, or waits as needed.
  • Masters long procedures: carries out multi-stage jobs in food, ride-hailing, shopping, social, and other apps.
  • Works out of the box: copes with brand-new apps and shifting layouts without any extra fine-tuning or domain-specific tweaks.

Usage

Please refer to here to use MobiZen-GUI-4B.

Deploy

We recommand deploy MobiZen-GUI-4B through vllm==0.11.0 / transformers==4.57.0.

Downloads last month
99
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alibabagroup/MobiZen-GUI-4B

Finetuned
(210)
this model