zsqzz commited on
Commit
76a5aea
·
verified ·
1 Parent(s): 7537c1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -16
README.md CHANGED
@@ -9,27 +9,20 @@ pinned: false
9
 
10
  # GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
11
 
12
- [![ArXiv](https://img.shields.io/badge/arXiv-25210.66666-b31b1b?style=flat&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2505.19590)
13
- [![Hugging Face](https://img.shields.io/badge/HuggingFace-GTAlign-orange?logo=huggingface&logoColor=white)](https://huggingface.co/collections/sunblaze-ucb/intuitor-684f895c78ed2d3ef3a678b3)
14
 
15
  **GTAlign** applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.
16
 
17
 
18
- # Models
19
 
20
-
21
- We have released five model checkpoints trained on four datasets.
22
-
23
- [View Model Collections](https://huggingface.co/collections/sunblaze-ucb/intuitor-684f895c78ed2d3ef3a678b3)
24
 
25
  | Model Name | Size | Dataset | Hugging Face Link |
26
  |------------|------|--------|--------------------|
27
- | `` | 3B | Math | [View Model](https://huggingface.co/sunblaze-ucb/Qwen2.5-1.5B-Intuitor-MATH-1EPOCH) |
28
- | `` | 3B | Medium | [View Model](https://huggingface.co/sunblaze-ucb/Qwen2.5-3B-Intuitor-MATH-1EPOCH) |
29
- | `` | 3B | Ambig-QA | [View Model](https://huggingface.co/sunblaze-ucb/OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH) |
30
- | `` | 3B | WildGuard | [View Model](https://huggingface.co/sunblaze-ucb/Qwen3-14B-Intuitor-MATH-1EPOCH) |
31
- | `` | 3B | Mixed | [View Model](https://huggingface.co/sunblaze-ucb/Qwen2.5-1.5B-GRPO-MATH-1EPOCH) |
32
-
33
-
34
-
35
-
 
9
 
10
  # GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
11
 
12
+ [![ArXiv](https://img.shields.io/badge/arXiv-25210.66666-b31b1b?style=flat&logo=arxiv&logoColor=white)](none)
13
+ [![Hugging Face](https://img.shields.io/badge/HuggingFace-GTAlign-orange?logo=huggingface&logoColor=white)](https://huggingface.co/GTAlign)
14
 
15
  **GTAlign** applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.
16
 
17
 
18
+ ## Models
19
 
20
+ We have released five model checkpoints, and we are preparing more thoroughly trained models.
 
 
 
21
 
22
  | Model Name | Size | Dataset | Hugging Face Link |
23
  |------------|------|--------|--------------------|
24
+ | `GTAlign/Qwen2.5-3B-Math-140step` | 3B | Math | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Math-140step) |
25
+ | `GTAlign/Qwen2.5-3B-Medium-110step` | 3B | Medium | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Medium-110step) |
26
+ | `GTAlign/Qwen2.5-3B-AbgQA-140step` | 3B | Ambig-QA | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-AbgQA-140step) |
27
+ | `GTAlign/Qwen2.5-3B-WildGuard-140step` | 3B | WildGuard | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-WildGuard-140step) |
28
+ | `GTAlign/Qwen2.5-3B-Full-160step` | 3B | Full | [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Full-160step) |