Instructions to use trillionlabs/gWorld-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use trillionlabs/gWorld-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="trillionlabs/gWorld-32B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, Qwen3VLForConditionalGenerationWithDummy

processor = AutoProcessor.from_pretrained("trillionlabs/gWorld-32B")
model = Qwen3VLForConditionalGenerationWithDummy.from_pretrained("trillionlabs/gWorld-32B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use trillionlabs/gWorld-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "trillionlabs/gWorld-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trillionlabs/gWorld-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/trillionlabs/gWorld-32B

SGLang

How to use trillionlabs/gWorld-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "trillionlabs/gWorld-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trillionlabs/gWorld-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "trillionlabs/gWorld-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trillionlabs/gWorld-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use trillionlabs/gWorld-32B with Docker Model Runner:
```
docker model run hf.co/trillionlabs/gWorld-32B
```

gWorld-32B 🌍📱

📄 Paper 🎬 Demo 👨🏻‍💻 Code

gWorld-32B🌍📱 is the first open-weight, single self-contained Vision-Language Model (VLM) specialized for visual mobile GUI world modeling. Unlike traditional visual world models that predict pixels directly, gWorld-32B predicts the next GUI state as executable web code. This approach ensures pixel-perfect text rendering and structurally accurate layouts, overcoming the hallucination and legibility issues common in pixel-generation models.

Model Summary

Architecture: Based on Qwen3-VL-32B
Task: Action-conditioned next-state prediction for mobile GUIs
Input: Current screenshot + Action
Output: Reasoning + Renderable HTML

Key Features

1. New Pareto Frontier

gWorld-32B establishes a new Pareto frontier in the trade-off between model size and GUI world modeling accuracy.

Efficiency: Outperforms frontier models up to 12.6x larger (e.g., Llama 4 402B-A17B) on GUI-specific benchmarks.
Accuracy: Achieves a +27.1% gain in Instruction Accuracy (IAcc.) over the base Qwen3-VL model.
Zero-Shot Generalization: Demonstrated high performance on out-of-distribution benchmarks like AndroidWorld and KApps (Korean).

2. Action Input & Operations

The model treats the mobile interface as a coordinate space and predicts how that space changes based on user input.

Coordinate Space: Operates on a normalized [0, 1000] scale.
Logic: It generates a "Next State Reasoning" block before the code to ensure the visual transition logically follows the intent of the action.
Example Actions: - {"action_type": "TAP", "coordinates": [512, 890]} or {"action_type": "TYPE", "text": "gWorld is a generative code mobile world model"}

3. Visual Code Rendering

By outputting HTML/CSS, gWorld ensures that text remains perfectly sharp and layouts are responsive.

High Renderability: <1% render failure rate.
Speed: Rendering via Playwright takes ~0.3s, significantly faster than multi-step diffusion pipelines.

Sample Usage

Inference with vLLM

To use the model, you can use the following snippet from the official repository:

from vllm import LLM, SamplingParams
from transformers import AutoProcessor
from PIL import Image

# Model configuration
MODEL_PATH = "trillionlabs/gWorld-32B"
BASE_MODEL = "Qwen/Qwen3-VL-32B"

# Image processing settings
MM_PROCESSOR_KWARGS = {
    "max_pixels": 4233600,
    "min_pixels": 3136,
}

# Load model
llm = LLM(
    model=MODEL_PATH,
    tokenizer=BASE_MODEL,
    tensor_parallel_size=8,
    gpu_memory_utilization=0.9,
    max_model_len=19384,
    trust_remote_code=True,
    mm_processor_kwargs=MM_PROCESSOR_KWARGS,
    enable_chunked_prefill=True,
    max_num_batched_tokens=16384,
)

# Load processor for chat template
processor = AutoProcessor.from_pretrained(BASE_MODEL, trust_remote_code=True)

# Prepare input
image = Image.open("screenshot.png") # Replace with your screenshot
if image.mode != 'RGB':
    image = image.convert('RGB')

action = '{"action_type": "TAP", "coordinates": [512, 890]}'

# World model prompt template
user_content = f"""You are an expert mobile UI World Model that can accurately predict the next state given an action.
Given a screenshot of a mobile interface and an action, you must generate clean, responsive HTML code that represents the state of the interface AFTER the action is performed.
First generate reasoning about what the next state should look like based on the action.
Afterwards, generate the HTML code representing the next state that logically follows the action.
You will render this HTML in a mobile viewport to see how similar it looks and acts like the mobile screenshot.

Requirements:
1. Provide reasoning about what the next state should look like based on the action
2. Generate complete, valid HTML5 code
3. Choose between using inline CSS and utility classes from Bootstrap, Tailwind CSS, or MUI for styling, depending on which option generates the closest code to the screenshot.
4. Use mobile-first design principles matching screenshot dimensions.
5. For images, use inline SVG placeholders with explicit width and height attributes that match the approximate dimensions from the screenshot. Matching the approximate color is also good.
6. Use modern web standards and best practices
7. Return ONLY the HTML code, no explanations or markdown formatting
8. The generated HTML should render properly in a mobile viewport.
9. Generated HTML should look like the screen that logically follows the current screen and the action.

Action:
{action}

Output format:
# Next State Reasoning: <your reasoning about what the next state should look like>
# HTML: <valid_html_code>

Generate the next state reasoning and the next state in html:"""

# Build messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": user_content},
        ],
    }
]

# Apply chat template
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

# Generation parameters
sampling_params = SamplingParams(
    max_tokens=15000,
    temperature=0,
    seed=42,
    top_p=1.0,
)

# Generate
outputs = llm.generate(
    [{"prompt": prompt, "multi_modal_data": {"image": image}}],
    sampling_params=sampling_params
)

print(outputs[0].outputs[0].text)

License and Contact

This model is licensed under the Apache License 2.0. For inquiries, please contact: info@trillionlabs.co

Citation

@misc{koh2026generativevisualcodemobile,
      title={Generative Visual Code Mobile World Models},
      author={Woosung Koh and Sungjun Han and Segyu Lee and Se-Young Yun and Jamin Shin},
      year={2026},
      eprint={2602.01576},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.01576},
}

Downloads last month: 11

Safetensors

Model size

33B params

Tensor type

BF16

Collection including trillionlabs/gWorld-32B

gWorld 🌏📱

Collection

Generative Visual Code Mobile World Model • 4 items • Updated 13 days ago • 4

Paper for trillionlabs/gWorld-32B

Generative Visual Code Mobile World Models

Paper • 2602.01576 • Published Feb 2 • 42