MobiZen-GUI-4B
🌐 Project | 💻 Demo | 📄 Chinese Trajectory Data
Introduction
MobiZen-GUI-4B is a native GUI agent model built on Qwen3-VL. It is trained on a large, hand-curated corpus of Chinese mobile GUI interactions, the model has learned from hundreds of thousands of real Chinese app sessions spanning e-commerce, transport, social, and finance. Each record includes screenshots, touch traces, and Chinese instructions, giving the agent deep insight into Chinese UI conventions and workflows.
The goal of MobiZen-GUI-4B is to make it easier—and faster—to build and ship Chinese Mobile GUI agents. It delivers:
- A 4-billion-parameter agent that runs completely on your own desktop or laptop.
- Fast execution speed, relying only on a single image and historical actions. It relies solely on a single current image and historical actions, requiring no additional information, resulting in fast execution speed.
- A turnkey inference kit that auto-handles ADB links and pulls in every required library.
What it can do
- Runs on everyday machines: engineered for snappy response while keeping data on-device.
- Sees and acts: spots buttons, text fields, lists, etc., then taps, types, swipes, or waits as needed.
- Masters long procedures: carries out multi-stage jobs in food, ride-hailing, shopping, social, and other apps.
- Works out of the box: copes with brand-new apps and shifting layouts without any extra fine-tuning or domain-specific tweaks.
Usage
Please refer to here to use MobiZen-GUI-4B.
Deploy
We recommand deploy MobiZen-GUI-4B through vllm==0.11.0 / transformers==4.57.0.
- Downloads last month
- 99
Model tree for alibabagroup/MobiZen-GUI-4B
Base model
Qwen/Qwen3-VL-4B-Instruct