|
|
--- |
|
|
title: README |
|
|
emoji: 🐠 |
|
|
colorFrom: yellow |
|
|
colorTo: gray |
|
|
sdk: gradio |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare |
|
|
|
|
|
[](https://arxiv.org/abs/2510.08872) |
|
|
[](https://huggingface.co/GTAlign) |
|
|
|
|
|
**GTAlign** applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings. |
|
|
|
|
|
|
|
|
## Models |
|
|
|
|
|
We have released five model checkpoints, and we are preparing more thoroughly trained models. |
|
|
|
|
|
| Model Name | Size | Dataset | Hugging Face Link | |
|
|
|------------|------|--------|--------------------| |
|
|
| `GTAlign/Qwen2.5-3B-Math-140step` | 3B | Math | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Math-140step) | |
|
|
| `GTAlign/Qwen2.5-3B-Medium-110step` | 3B | Medium | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Medium-110step) | |
|
|
| `GTAlign/Qwen2.5-3B-AbgQA-140step` | 3B | Ambig-QA | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-AbgQA-140step) | |
|
|
| `GTAlign/Qwen2.5-3B-WildGuard-140step` | 3B | WildGuard | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-WildGuard-140step) | |
|
|
| `GTAlign/Qwen2.5-3B-Full-160step` | 3B | Full | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Full-160step) | |
|
|
|