README / README.md
zsqzz's picture
Update README.md
0e33f3c verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: README
emoji: 🐠
colorFrom: yellow
colorTo: gray
sdk: gradio
pinned: false

GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

ArXiv Hugging Face

GTAlign applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.

Models

We have released five model checkpoints, and we are preparing more thoroughly trained models.

Model Name Size Dataset Hugging Face Link
GTAlign/Qwen2.5-3B-Math-140step 3B Math Model
GTAlign/Qwen2.5-3B-Medium-110step 3B Medium Model
GTAlign/Qwen2.5-3B-AbgQA-140step 3B Ambig-QA Model
GTAlign/Qwen2.5-3B-WildGuard-140step 3B WildGuard Model
GTAlign/Qwen2.5-3B-Full-160step 3B Full Model