Update README.md
Browse files
README.md
CHANGED
|
@@ -7,4 +7,29 @@ sdk: gradio
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
|
| 11 |
+
|
| 12 |
+
[](https://arxiv.org/abs/2505.19590)
|
| 13 |
+
[](https://huggingface.co/collections/sunblaze-ucb/intuitor-684f895c78ed2d3ef3a678b3)
|
| 14 |
+
|
| 15 |
+
**GTAlign** applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
# Models
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
We have released five model checkpoints trained on four datasets.
|
| 22 |
+
|
| 23 |
+
[View Model Collections](https://huggingface.co/collections/sunblaze-ucb/intuitor-684f895c78ed2d3ef3a678b3)
|
| 24 |
+
|
| 25 |
+
| Model Name | Size | Dataset | Hugging Face Link |
|
| 26 |
+
|------------|------|--------|--------------------|
|
| 27 |
+
| `` | 3B | Math | [View Model](https://huggingface.co/sunblaze-ucb/Qwen2.5-1.5B-Intuitor-MATH-1EPOCH) |
|
| 28 |
+
| `` | 3B | Medium | [View Model](https://huggingface.co/sunblaze-ucb/Qwen2.5-3B-Intuitor-MATH-1EPOCH) |
|
| 29 |
+
| `` | 3B | Ambig-QA | [View Model](https://huggingface.co/sunblaze-ucb/OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH) |
|
| 30 |
+
| `` | 3B | WildGuard | [View Model](https://huggingface.co/sunblaze-ucb/Qwen3-14B-Intuitor-MATH-1EPOCH) |
|
| 31 |
+
| `` | 3B | Mixed | [View Model](https://huggingface.co/sunblaze-ucb/Qwen2.5-1.5B-GRPO-MATH-1EPOCH) |
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
|