Spaces:

GTAlign
/

README

No application file

README / README.md

Update README.md

0e33f3c verified 4 months ago

1.47 kB

	---
	title: README
	emoji: 🐠
	colorFrom: yellow
	colorTo: gray
	sdk: gradio
	pinned: false
	---

	# GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

	[![ArXiv](https://img.shields.io/badge/arXiv-2510.08872-b31b1b?style=flat&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2510.08872)
	[![Hugging Face](https://img.shields.io/badge/HuggingFace-GTAlign-orange?logo=huggingface&logoColor=white)](https://huggingface.co/GTAlign)

	GTAlign applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.


	## Models

	We have released five model checkpoints, and we are preparing more thoroughly trained models.

	\| Model Name \| Size \| Dataset \| Hugging Face Link \|
	\|------------\|------\|--------\|--------------------\|
	\| `GTAlign/Qwen2.5-3B-Math-140step` \| 3B \| Math \| [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Math-140step) \|
	\| `GTAlign/Qwen2.5-3B-Medium-110step` \| 3B \| Medium \| [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Medium-110step) \|
	\| `GTAlign/Qwen2.5-3B-AbgQA-140step` \| 3B \| Ambig-QA \| [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-AbgQA-140step) \|
	\| `GTAlign/Qwen2.5-3B-WildGuard-140step` \| 3B \| WildGuard \| [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-WildGuard-140step) \|
	\| `GTAlign/Qwen2.5-3B-Full-160step` \| 3B \| Full \| [Model](https://huggingface.co/GTAlign/Qwen2.5-3B-Full-160step) \|