gWorld-8B ππ±
gWorld-8B ππ± is the first open-weight, single self-contained Vision-Language Model (VLM) specialized for visual mobile GUI world modeling. Unlike traditional visual world models that predict pixels directly, gWorld-8B predicts the next GUI state as executable web code. This approach ensures pixel-perfect text rendering and structurally accurate layouts, overcoming the hallucination and legibility issues common in pixel-generation models.
Model Summary
- Architecture: Based on
Qwen3-VL-8B - Task: Action-conditioned next-state prediction for mobile GUIs
- Input: Current screenshot + Action
- Output: Reasoning + Renderable HTML
- Repository: https://github.com/trillion-labs/gWorld
Key Features
1. New Pareto Frontier
gWorld-8B establishes a new Pareto frontier in the trade-off between model size and GUI world modeling accuracy.
- Efficiency: Outperforms frontier models up to 50.25x larger (e.g.,
Llama 4 402B-A17B) on GUI-specific benchmarks. - Accuracy: Achieves a +45.7% gain in Instruction Accuracy (IAcc.) over the base Qwen3-VL model.
- Zero-Shot Generalization: Demonstrated high performance on out-of-distribution benchmarks like AndroidWorld and KApps (Korean).
2. Action Input & Operations
The model treats the mobile interface as a coordinate space and predicts how that space changes based on user input.
- Coordinate Space: Operates on a normalized [0, 1000] scale.
- Logic: It generates a "Next State Reasoning" block before the code to ensure the visual transition logically follows the intent of the action.
- Example Actions: -
{"action_type": "TAP", "coordinates": [512, 890]}or{"action_type": "TYPE", "text": "gWorld is a generative code mobile world model"}
3. Visual Code Rendering
By outputting HTML/CSS, gWorld ensures that text remains perfectly sharp and layouts are responsive.
- High Renderability: <1% render failure rate.
- Speed: Rendering via Playwright takes ~0.3s, significantly faster than multi-step diffusion pipelines.
- Setup: For rendering utilities, visit the official GitHub.
License and Contact
This model is licensed under the Apache License 2.0. For inquiries, please contact: info@trillionlabs.co
Citation
@misc{koh2026generativevisualcodemobile,
title={Generative Visual Code Mobile World Models},
author={Woosung Koh and Sungjun Han and Segyu Lee and Se-Young Yun and Jamin Shin},
year={2026},
eprint={2602.01576},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.01576},
}
- Downloads last month
- 29