StarWM: An Action-Conditioned World Model for StarCraft II
StarWM is the first action-conditioned world model for StarCraft II. Given a structured observation and a sequence of actions, StarWM predicts future observations under partial observability.
π Paper:
World Models for Policy Refinement in StarCraft II (arXiv:2602.14857)
π GitHub:
https://github.com/yxzzhang/StarWM
π Training Dataset:
SC2-Dynamics-50K
π Model Description
StarWM is trained via supervised fine-tuning on Qwen3-8B using the SC2-Dynamics-50K dataset. The model learns to predict textual observations 5 seconds into the future, conditioned on:
- Current textual observation
- A sequence of actions
The textual representation factorizes StarCraft II observation into five semantic modules:
- Info: Describes economy and status (Minerals, Gas, Collection Rate, Supply, Alerts, Upgrades)
- Queue: Records ongoing tasks (construction, production, upgrades) and their progress
- My Units: Includes self unitsβ IDs, positions, health percentage (HP), energy and status
- My Structures: Describes self static assets
- Visible Hostiles: Includes visible enemy units, structures, and snapshot enemy structures under the fog of war
This design enables dynamics modeling in a hybrid and partially observable RTS environment.
π― Purpose
StarWM is designed for:
- Action-conditioned observation prediction in StarCraft II
- Short-horizon predictive simulation
- World-model-augmented policy refinement
- Inspiring future research on LLM-based dynamics modeling in complex RTS environments and model-based reasoning under partial observability
π Citation
If you use this model, please cite:
@misc{zhang2026worldmodels,
title={World Models for Policy Refinement in StarCraft II},
author={Yixin Zhang and Ziyi Wang and Yiming Rong and Haoxi Wang and Jinling Jiang and Shuang Xu and Haoran Wu and Shiyu Zhou and Bo Xu},
year={2026},
eprint={2602.14857},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.14857},
}
- Downloads last month
- 24