| | --- |
| | base_model: |
| | - Qwen/Qwen3-4B |
| | language: |
| | - en |
| | license: apache-2.0 |
| | tags: |
| | - agent |
| | - tool-use |
| | - reinforcement-learning |
| | - mcp |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | --- |
| | |
| | <h1 align="center">Arctic-AWM-4B</h1> |
| |
|
| | <h3 align="center">Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning</h3> |
| |
|
| | <p align="center"> |
| | <a href="https://github.com/Raibows">Zhaoyang Wang<sup>1</sup></a>, |
| | <a href="https://www.canwenxu.net/">Canwen Xu<sup>2</sup></a>, |
| | <a href="https://www.snowflake.com/en/blog/authors/boyi-liu/">Boyi Liu<sup>2</sup></a>, |
| | <a href="https://yitewang.github.io/">Yite Wang<sup>2</sup></a>, |
| | <a href="https://lillianwei-h.github.io/">Siwei Han<sup>1</sup></a>,<br/> |
| | <a href="https://yaozhewei.github.io/">Zhewei Yao<sup>2</sup></a>, |
| | <a href="https://www.huaxiuyao.io/">Huaxiu Yao<sup>1</sup></a>, |
| | <a href="https://www.snowflake.com/en/blog/authors/yuxiong-he/">Yuxiong He<sup>2</sup></a> |
| | </p> |
| | <p align="center"> |
| | <sup>1</sup>UNC-Chapel Hill <sup>2</sup>Snowflake AI Research |
| | </p> |
| |
|
| | # Overview |
| |
|
| | **Arctic-AWM-4B** is a multi-turn tool-use agent model trained with agentic reinforcement learning on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), using the fully synthetic environments from [AgentWorldModel-1K](https://huggingface.co/datasets/Snowflake/AgentWorldModel-1K). It was introduced in the paper [Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning](https://huggingface.co/papers/2602.10090). |
| |
|
| | The model is trained to interact with tool-use environments exposed via a unified MCP (Model Context Protocol) interface, enabling strong multi-turn agentic capabilities. |
| |
|
| | # Sample Usage |
| |
|
| | To use the model for agentic tasks, you can serve it using [vLLM](https://github.com/vllm-project/vllm) and interact with it using the `awm` CLI tool. |
| |
|
| | ### Serve the model |
| | ```bash |
| | vllm serve Snowflake/Arctic-AWM-4B --host 127.0.0.1 --port 8000 |
| | ``` |
| |
|
| | ### Run the Agent Demo |
| | After starting an MCP environment (see the [GitHub repository](https://github.com/Snowflake-Labs/agent-world-model) for environment setup), you can run the agent: |
| |
|
| | ```bash |
| | awm agent \ |
| | --task "show me the top 10 most expensive products" \ |
| | --mcp_url http://localhost:8001/mcp \ |
| | --vllm_url http://localhost:8000/v1 \ |
| | --model Snowflake/Arctic-AWM-4B |
| | ``` |
| |
|
| | # Resources |
| |
|
| | Related resources are also available, please check: |
| |
|
| | | Resource | Link | |
| | |----------|------| |
| | | π Paper | [π arxiv.org/abs/2602.10090](https://arxiv.org/abs/2602.10090) | |
| | | π» Code | [π» Snowflake-Labs/agent-world-model](https://github.com/Snowflake-Labs/agent-world-model) | |
| | | π¦ AgentWorldModel-1K | [π€ Snowflake/AgentWorldModel-1K](https://huggingface.co/datasets/Snowflake/AgentWorldModel-1K) | |
| | | π€ Arctic-AWM-4B | [π€ Snowflake/Arctic-AWM-4B](https://huggingface.co/Snowflake/Arctic-AWM-4B) | |
| | | π€ Arctic-AWM-8B | [π€ Snowflake/Arctic-AWM-8B](https://huggingface.co/Snowflake/Arctic-AWM-8B) | |
| | | π€ Arctic-AWM-14B | [π€ Snowflake/Arctic-AWM-14B](https://huggingface.co/Snowflake/Arctic-AWM-14B) | |
| |
|
| | # Citation |
| |
|
| | If you find this resource useful, please kindly cite: |
| |
|
| | ```bibtex |
| | @article{wang2026agentworldmodelinfinity, |
| | title={Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning}, |
| | author={Zhaoyang Wang and Canwen Xu and Boyi Liu and Yite Wang and Siwei Han and Zhewei Yao and Huaxiu Yao and Yuxiong He}, |
| | year={2026}, |
| | eprint={2602.10090}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.AI}, |
| | url={https://arxiv.org/abs/2602.10090}, |
| | } |
| | ``` |