Instructions to use trillionlabs/gWorld-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use trillionlabs/gWorld-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="trillionlabs/gWorld-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("trillionlabs/gWorld-8B")
model = AutoModelForMultimodalLM.from_pretrained("trillionlabs/gWorld-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use trillionlabs/gWorld-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "trillionlabs/gWorld-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trillionlabs/gWorld-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/trillionlabs/gWorld-8B

SGLang

How to use trillionlabs/gWorld-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "trillionlabs/gWorld-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trillionlabs/gWorld-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "trillionlabs/gWorld-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trillionlabs/gWorld-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use trillionlabs/gWorld-8B with Docker Model Runner:
```
docker model run hf.co/trillionlabs/gWorld-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

gWorld-8B 🌍📱

📄 Paper 🎬 Demo 👨🏻‍💻 Code

🎉 News (May, 2026): gWorld has been accepted to ICML 2026! See you in Seoul. 🇰🇷

gWorld-8B (GuiWorld) 🌍📱 is the first open-weight, single self-contained Vision-Language Model (VLM) specialized for visual mobile GUI world modeling. Unlike traditional visual world models that predict pixels directly, gWorld-8B predicts the next GUI state as executable web code. This approach ensures pixel-perfect text rendering and structurally accurate layouts, overcoming the hallucination and legibility issues common in pixel-generation models.

This model was presented in the paper Generative Visual Code Mobile World Models.

Sample Usage

You can run inference using the vLLM library as follows:

from vllm import LLM, SamplingParams
from transformers import AutoProcessor
from PIL import Image

# Model configuration (choose one)
# For gWorld-8B:
MODEL_PATH = "trillionlabs/gWorld-8B"
BASE_MODEL = "Qwen/Qwen3-VL-8B-Instruct"

# For gWorld-32B:
# MODEL_PATH = "trillionlabs/gWorld-32B"
# BASE_MODEL = "Qwen/Qwen3-VL-32B"

# Image processing settings
MM_PROCESSOR_KWARGS = {
    "max_pixels": 4233600,
    "min_pixels": 3136,
}

# Load model
llm = LLM(
    model=MODEL_PATH,
    tokenizer=BASE_MODEL,
    tensor_parallel_size=8,
    gpu_memory_utilization=0.9,
    max_model_len=19384,
    trust_remote_code=True,
    mm_processor_kwargs=MM_PROCESSOR_KWARGS,
    enable_chunked_prefill=True,
    max_num_batched_tokens=16384,
)

# Load processor for chat template
processor = AutoProcessor.from_pretrained(BASE_MODEL, trust_remote_code=True)

# Prepare input
image = Image.open("screenshot.png")
if image.mode != 'RGB':
    image = image.convert('RGB')

action = '{"action_type": "TAP", "coordinates": [512, 890]}'

# World model prompt template
user_content = f"""You are an expert mobile UI World Model that can accurately predict the next state given an action.
Given a screenshot of a mobile interface and an action, you must generate clean, responsive HTML code that represents the state of the interface AFTER the action is performed.
First generate reasoning about what the next state should look like based on the action.
Afterwards, generate the HTML code representing the next state that logically follows the action.
You will render this HTML in a mobile viewport to see how similar it looks and acts like the mobile screenshot.

Requirements:
1. Provide reasoning about what the next state should look like based on the action
2. Generate complete, valid HTML5 code
3. Choose between using inline CSS and utility classes from Bootstrap, Tailwind CSS, or MUI for styling, depending on which option generates the closest code to the screenshot.
4. Use mobile-first design principles matching screenshot dimensions.
5. For images, use inline SVG placeholders with explicit width and height attributes that match the approximate dimensions from the screenshot. Matching the approximate color is also good.
6. Use modern web standards and best practices
7. Return ONLY the HTML code, no explanations or markdown formatting
8. The generated HTML should render properly in a mobile viewport.
9. Generated HTML should look like the screen that logically follows the current screen and the action.

Action:
{action}

Output format:
# Next State Reasoning: <your reasoning about what the next state should look like>
# HTML: <valid_html_code>

Generate the next state reasoning and the next state in html:"""

# Build messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": user_content},
        ],
    }
]

# Apply chat template
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

# Generation parameters
sampling_params = SamplingParams(
    max_tokens=15000,
    temperature=0,
    seed=42,
    top_p=1.0,
)

# Generate
outputs = llm.generate(
    [{"prompt": prompt, "multi_modal_data": {"image": image}}],
    sampling_params=sampling_params
)

print(outputs[0].outputs[0].text)

Model Summary

Architecture: Based on Qwen3-VL-8B
Task: Action-conditioned next-state prediction for mobile GUIs
Input: Current screenshot + Action
Output: Reasoning + Renderable HTML

Key Features

1. New Pareto Frontier

gWorld-8B establishes a new Pareto frontier in the trade-off between model size and GUI world modeling accuracy.

Efficiency: Outperforms frontier models up to 50.25x larger (e.g., Llama 4 402B-A17B) on GUI-specific benchmarks.
Accuracy: Achieves a +45.7% gain in Instruction Accuracy (IAcc.) over the base Qwen3-VL model.
Zero-Shot Generalization: Demonstrated high performance on out-of-distribution benchmarks like AndroidWorld and KApps (Korean).

2. Action Input & Operations

The model treats the mobile interface as a coordinate space and predicts how that space changes based on user input.

Coordinate Space: Operates on a normalized [0, 1000] scale.
Logic: It generates a "Next State Reasoning" block before the code to ensure the visual transition logically follows the intent of the action.
Example Actions: - {"action_type": "TAP", "coordinates": [512, 890]} or {"action_type": "TYPE", "text": "gWorld is a generative code mobile world model"}

3. Visual Code Rendering

By outputting HTML/CSS, gWorld ensures that text remains perfectly sharp and layouts are responsive.

High Renderability: <1% render failure rate.
Speed: Rendering via Playwright takes ~0.3s, significantly faster than multi-step diffusion pipelines.
Setup: For rendering utilities, visit the official GitHub.

License and Contact

This model is licensed under the Apache License 2.0. For inquiries, please contact: info@trillionlabs.co

Citation

@misc{koh2026generativevisualcodemobile,
      title={Generative Visual Code Mobile World Models},
      author={Woosung Koh and Sungjun Han and Segyu Lee and Se-Young Yun and Jamin Shin},
      year={2026},
      eprint={2602.01576},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.01576},
}

Downloads last month: 790

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for trillionlabs/gWorld-8B

Quantizations

2 models

Collection including trillionlabs/gWorld-8B

GuiWorld 🌏📱

Collection

Generative Visual Code Mobile World Model -- gWorld • 4 items • Updated May 11 • 4

Paper for trillionlabs/gWorld-8B

Generative Visual Code Mobile World Models

Paper • 2602.01576 • Published Feb 2 • 42