base_model:
- Qwen/Qwen3-4B
language:
- en
license: apache-2.0
tags:
- agent
- tool-use
- reinforcement-learning
- mcp
pipeline_tag: text-generation
library_name: transformers
Arctic-AWM-4B
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
Zhaoyang Wang1,
Canwen Xu2,
Boyi Liu2,
Yite Wang2,
Siwei Han1,
Zhewei Yao2,
Huaxiu Yao1,
Yuxiong He2
1UNC-Chapel Hill 2Snowflake AI Research
Overview
Arctic-AWM-4B is a multi-turn tool-use agent model trained with agentic reinforcement learning on Qwen3-4B, using the fully synthetic environments from AgentWorldModel-1K. It was introduced in the paper Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning.
The model is trained to interact with tool-use environments exposed via a unified MCP (Model Context Protocol) interface, enabling strong multi-turn agentic capabilities.
Sample Usage
To use the model for agentic tasks, you can serve it using vLLM and interact with it using the awm CLI tool.
Serve the model
vllm serve Snowflake/Arctic-AWM-4B --host 127.0.0.1 --port 8000
Run the Agent Demo
After starting an MCP environment (see the GitHub repository for environment setup), you can run the agent:
awm agent \
--task "show me the top 10 most expensive products" \
--mcp_url http://localhost:8001/mcp \
--vllm_url http://localhost:8000/v1 \
--model Snowflake/Arctic-AWM-4B
Resources
Related resources are also available, please check:
| Resource | Link |
|---|---|
| π Paper | π arxiv.org/abs/2602.10090 |
| π» Code | π» Snowflake-Labs/agent-world-model |
| π¦ AgentWorldModel-1K | π€ Snowflake/AgentWorldModel-1K |
| π€ Arctic-AWM-4B | π€ Snowflake/Arctic-AWM-4B |
| π€ Arctic-AWM-8B | π€ Snowflake/Arctic-AWM-8B |
| π€ Arctic-AWM-14B | π€ Snowflake/Arctic-AWM-14B |
Citation
If you find this resource useful, please kindly cite:
@article{wang2026agentworldmodelinfinity,
title={Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning},
author={Zhaoyang Wang and Canwen Xu and Boyi Liu and Yite Wang and Siwei Han and Zhewei Yao and Huaxiu Yao and Yuxiong He},
year={2026},
eprint={2602.10090},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.10090},
}