Update README.md
Browse files
README.md
CHANGED
|
@@ -9,27 +9,20 @@ pinned: false
|
|
| 9 |
|
| 10 |
# GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
|
| 11 |
|
| 12 |
-
[](
|
| 13 |
-
[](https://huggingface.co/
|
| 14 |
|
| 15 |
**GTAlign** applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.
|
| 16 |
|
| 17 |
|
| 18 |
-
# Models
|
| 19 |
|
| 20 |
-
|
| 21 |
-
We have released five model checkpoints trained on four datasets.
|
| 22 |
-
|
| 23 |
-
[View Model Collections](https://huggingface.co/collections/sunblaze-ucb/intuitor-684f895c78ed2d3ef3a678b3)
|
| 24 |
|
| 25 |
| Model Name | Size | Dataset | Hugging Face Link |
|
| 26 |
|------------|------|--------|--------------------|
|
| 27 |
-
| `` | 3B | Math | [
|
| 28 |
-
| `` | 3B
|
| 29 |
-
| `` | 3B
|
| 30 |
-
| `` | 3B
|
| 31 |
-
| `` | 3B |
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
|
|
|
| 9 |
|
| 10 |
# GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
|
| 11 |
|
| 12 |
+
[](none)
|
| 13 |
+
[](https://huggingface.co/GTAlign)
|
| 14 |
|
| 15 |
**GTAlign** applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.
|
| 16 |
|
| 17 |
|
| 18 |
+
## Models
|
| 19 |
|
| 20 |
+
We have released five model checkpoints, and we are preparing more thoroughly trained models.
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
| Model Name | Size | Dataset | Hugging Face Link |
|
| 23 |
|------------|------|--------|--------------------|
|
| 24 |
+
| `GTAlign/Qwen2.5-3B-Math-140step` | 3B | Math | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Math-140step) |
|
| 25 |
+
| `GTAlign/Qwen2.5-3B-Medium-110step` | 3B | Medium | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Medium-110step) |
|
| 26 |
+
| `GTAlign/Qwen2.5-3B-AbgQA-140step` | 3B | Ambig-QA | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-AbgQA-140step) |
|
| 27 |
+
| `GTAlign/Qwen2.5-3B-WildGuard-140step` | 3B | WildGuard | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-WildGuard-140step) |
|
| 28 |
+
| `GTAlign/Qwen2.5-3B-Full-160step` | 3B | Full | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Full-160step) |
|
|
|
|
|
|
|
|
|
|
|
|