gWorld-32B πŸŒπŸ“±

πŸ“„ Paper 🎬 Demo πŸ‘¨πŸ»β€πŸ’» Code

gWorld-32BπŸŒπŸ“± is the first open-weight, single self-contained Vision-Language Model (VLM) specialized for visual mobile GUI world modeling. Unlike traditional visual world models that predict pixels directly, gWorld-32B predicts the next GUI state as executable web code. This approach ensures pixel-perfect text rendering and structurally accurate layouts, overcoming the hallucination and legibility issues common in pixel-generation models.

Model Summary

  • Architecture: Based on Qwen3-VL-32B
  • Task: Action-conditioned next-state prediction for mobile GUIs
  • Input: Current screenshot + Action
  • Output: Reasoning + Renderable HTML
  • Repository: https://github.com/trillion-labs/gWorld

Key Features

1. New Pareto Frontier

gWorld-32B establishes a new Pareto frontier in the trade-off between model size and GUI world modeling accuracy.

  • Efficiency: Outperforms frontier models up to 12.6x larger (e.g., Llama 4 402B0-A17B) on GUI-specific benchmarks.
  • Accuracy: Achieves a +27.1% gain in Instruction Accuracy (IAcc.) over the base Qwen3-VL model.
  • Zero-Shot Generalization: Demonstrated high performance on out-of-distribution benchmarks like AndroidWorld and KApps (Korean).

2. Action Input & Operations

The model treats the mobile interface as a coordinate space and predicts how that space changes based on user input.

  • Coordinate Space: Operates on a normalized [0, 1000] scale.
  • Logic: It generates a "Next State Reasoning" block before the code to ensure the visual transition logically follows the intent of the action.
  • Example Actions: - {"action_type": "TAP", "coordinates": [512, 890]} or {"action_type": "TYPE", "text": "gWorld is a generative code mobile world model"}

3. Visual Code Rendering

By outputting HTML/CSS, gWorld ensures that text remains perfectly sharp and layouts are responsive.

  • High Renderability: <1% render failure rate.
  • Speed: Rendering via Playwright takes ~0.3s, significantly faster than multi-step diffusion pipelines.
  • Setup: For rendering utilities, visit the official GitHub.

License and Contact

This model is licensed under the Apache License 2.0. For inquiries, please contact: info@trillionlabs.co

Citation

@misc{koh2026generativevisualcodemobile,
      title={Generative Visual Code Mobile World Models},
      author={Woosung Koh and Sungjun Han and Segyu Lee and Se-Young Yun and Jamin Shin},
      year={2026},
      eprint={2602.01576},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.01576},
}
Downloads last month
95
Safetensors
Model size
33B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including trillionlabs/gWorld-32B

Paper for trillionlabs/gWorld-32B