--- title: README emoji: 🐠 colorFrom: yellow colorTo: gray sdk: gradio pinned: false --- # GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare [![ArXiv](https://img.shields.io/badge/arXiv-2510.08872-b31b1b?style=flat&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2510.08872) [![Hugging Face](https://img.shields.io/badge/HuggingFace-GTAlign-orange?logo=huggingface&logoColor=white)](https://huggingface.co/GTAlign) **GTAlign** applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings. ## Models We have released five model checkpoints, and we are preparing more thoroughly trained models. | Model Name | Size | Dataset | Hugging Face Link | |------------|------|--------|--------------------| | `GTAlign/Qwen2.5-3B-Math-140step` | 3B | Math | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Math-140step) | | `GTAlign/Qwen2.5-3B-Medium-110step` | 3B | Medium | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Medium-110step) | | `GTAlign/Qwen2.5-3B-AbgQA-140step` | 3B | Ambig-QA | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-AbgQA-140step) | | `GTAlign/Qwen2.5-3B-WildGuard-140step` | 3B | WildGuard | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-WildGuard-140step) | | `GTAlign/Qwen2.5-3B-Full-160step` | 3B | Full | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Full-160step) |