| | --- |
| | base_model: Qwen/Qwen2.5-7B-Instruct |
| | library_name: peft |
| | license: apache-2.0 |
| | pipeline_tag: text-generation |
| | language: |
| | - en |
| | tags: |
| | - game-theory |
| | - qwen2.5 |
| | - qlora |
| | - fine-tuning |
| | - nash-equilibrium |
| | - economics |
| | - math |
| | - reasoning |
| | - lora |
| | - sft |
| | - transformers |
| | - trl |
| | - 4-bit |
| | - bitsandbytes |
| | datasets: |
| | - 2reb/GameTheory-Bench |
| | model-index: |
| | - name: GameTheory-Solver |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Game Theory Problem Solving |
| | dataset: |
| | name: GameTheory-Bench |
| | type: 2reb/GameTheory-Bench |
| | metrics: |
| | - name: Overall Accuracy |
| | type: accuracy |
| | value: 94.0 |
| | verified: false |
| | - name: Hard Problem Accuracy |
| | type: accuracy |
| | value: 94.4 |
| | verified: false |
| | --- |
| | |
| | # ๐ฏ GameTheory-Solver |
| |
|
| | **A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.** |
| |
|
| | [](https://huggingface.co/datasets/2reb/GameTheory-Bench) |
| | [](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) |
| | [](https://opensource.org/licenses/Apache-2.0) |
| |
|
| | --- |
| |
|
| | ## ๐ Model Description |
| |
|
| | GameTheory-Solver is a **LoRA adapter** trained on the [GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) dataset โ the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers. |
| |
|
| | **Key result:** The fine-tuned model achieves **94% overall accuracy** (up from 82% base) and **94.4% on hard problems** (up from 66.7% base), representing a **+12pp overall** and **+27.7pp hard-problem improvement**. |
| |
|
| | ### ๐ง Capabilities |
| |
|
| | | Capability | Details | |
| | |---|---| |
| | | **Nash Equilibrium Computation** | Pure and mixed strategies for 2ร2, 3ร3, 3ร4, and 4ร4 games | |
| | | **Dominant Strategy Analysis** | IESDS (Iterated Elimination of Strictly Dominated Strategies) | |
| | | **Zero-Sum Game Solving** | Minimax theorem, saddle point detection, mixed strategies | |
| | | **Sequential Game Analysis** | Backward induction, subgame perfect equilibrium (up to 3 stages) | |
| | | **Bayesian Game Equilibria** | Incomplete information, BNE, signaling games | |
| | | **Cooperative Game Theory** | Shapley value computation, core analysis | |
| | | **Auction Theory** | First-price, second-price (Vickrey), all-pay, revenue equivalence | |
| | | **Mechanism Design** | VCG mechanisms, incentive compatibility analysis | |
| |
|
| | --- |
| |
|
| | ## ๐ Benchmark Results |
| |
|
| | Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels. |
| |
|
| | ### Overall Performance: Base vs. Solver |
| |
|
| | | Metric | Base (Qwen2.5-7B) | **Solver (Fine-tuned)** | **ฮ Improvement** | |
| | |---|:---:|:---:|:---:| |
| | | **Overall Accuracy** | 82% | **94%** | **+12%** โ
| |
| | | **Hard Problems** | 66.7% | **94.4%** | **+27.7%** ๐ | |
| |
|
| | ### Per-Category Accuracy |
| |
|
| | | Category | Base | Solver | ฮ | Trend | |
| | |---|:---:|:---:|:---:|:---:| |
| | | Normal Form 2ร2 | 100% | 80% | โ20% | ๐ | |
| | | Normal Form 3ร3 | 80% | 60% | โ20% | ๐ | |
| | | Normal Form 3ร4 | 100% | 100% | โ | โก๏ธ | |
| | | Normal Form 4ร4 | 100% | 100% | โ | โก๏ธ | |
| | | Zero-Sum | 100% | 100% | โ | โก๏ธ | |
| | | Sequential Game | 100% | 100% | โ | โก๏ธ | |
| | | Auction Theory | 80% | **100%** | +20% | ๐ | |
| | | Bayesian Game | 0% | **100%** | **+100%** | ๐ | |
| | | Cooperative Game | 100% | 100% | โ | โก๏ธ | |
| | | Mechanism Design | 60% | **100%** | +40% | ๐ | |
| |
|
| | > **Highlight:** The model achieves the most dramatic gains on previously unsolvable categories โ **Bayesian Games** (0% โ 100%) and **Mechanism Design** (60% โ 100%) โ while maintaining perfect scores across zero-sum, sequential, and cooperative games. |
| |
|
| | --- |
| |
|
| | ## ๐ Usage |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | pip install transformers peft bitsandbytes accelerate torch |
| | ``` |
| |
|
| | ### Loading the Model |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
| | from peft import PeftModel |
| | |
| | # Quantization config (matches training) |
| | bnb_config = BitsAndBytesConfig( |
| | load_in_4bit=True, |
| | bnb_4bit_quant_type="nf4", |
| | bnb_4bit_compute_dtype=torch.bfloat16, |
| | bnb_4bit_use_double_quant=True, |
| | ) |
| | |
| | # Load base model + adapter |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "Qwen/Qwen2.5-7B-Instruct", |
| | quantization_config=bnb_config, |
| | device_map="auto", |
| | ) |
| | model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver") |
| | tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver") |
| | ``` |
| |
|
| | ### Solving a Game Theory Problem |
| |
|
| | ```python |
| | messages = [ |
| | { |
| | "role": "system", |
| | "content": ( |
| | "You are a game theory expert. Solve the given problem " |
| | "step-by-step, showing all mathematical reasoning. " |
| | "Provide the final answer clearly." |
| | ), |
| | }, |
| | { |
| | "role": "user", |
| | "content": ( |
| | "Consider the following game:\n\n" |
| | "Player 1 \\ Player 2 | Left | Right\n" |
| | "--- | --- | ---\n" |
| | "Up | (3,1) | (0,0)\n" |
| | "Down | (1,1) | (2,3)\n\n" |
| | "Find all Nash Equilibria." |
| | ), |
| | }, |
| | ] |
| | |
| | inputs = tokenizer.apply_chat_template( |
| | messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" |
| | ).to(model.device) |
| | |
| | with torch.no_grad(): |
| | outputs = model.generate(inputs, max_new_tokens=512, do_sample=False) |
| | |
| | response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True) |
| | print(response) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## ๐๏ธ Training Details |
| |
|
| | ### Base Model |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | **Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | |
| | | **Total Parameters** | 7.6B | |
| | | **Trainable Parameters** | 161M (2.1% of total) | |
| |
|
| | ### Dataset |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) | |
| | | **Train Split** | 2,767 examples | |
| | | **Eval Split** | 146 examples (5% held out) | |
| |
|
| | ### QLoRA Configuration |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | LoRA rank (`r`) | 64 | |
| | | LoRA alpha (`ฮฑ`) | 128 | |
| | | LoRA dropout | 0.05 | |
| | | Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` | |
| | | Quantization | 4-bit NF4 with double quantization | |
| | | Compute dtype | bfloat16 | |
| |
|
| | ### Training Hyperparameters |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | Epochs | 3 | |
| | | Batch size (per device) | 2 | |
| | | Gradient accumulation steps | 8 | |
| | | Effective batch size | 16 | |
| | | Learning rate | 2e-4 | |
| | | LR scheduler | Cosine | |
| | | Warmup ratio | 0.05 | |
| | | Weight decay | 0.01 | |
| | | Max sequence length | 2,048 | |
| | | Packing | Enabled | |
| | | Optimizer | `paged_adamw_8bit` | |
| | | Gradient checkpointing | Enabled | |
| |
|
| | ### Training Results |
| |
|
| | | Metric | Value | |
| | |---|---| |
| | | **Train loss** | 0.1613 | |
| | | **Eval loss** | 0.0873 | |
| | | **Token accuracy** | 96.1% | |
| | | Total steps | 135 | |
| | | Training runtime | ~2 hours | |
| | | **Hardware** | 2ร NVIDIA RTX 3090 (24 GB each) | |
| |
|
| | --- |
| |
|
| | ## โ ๏ธ Limitations |
| |
|
| | - **Small-matrix regression:** Accuracy on 2ร2 and 3ร3 normal-form games decreased after fine-tuning (100% โ 80% and 80% โ 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones. |
| | - **Mixed-strategy precision:** Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues. |
| | - **Context length:** Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions. |
| | - **Synthetic training data:** The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting. |
| |
|
| | --- |
| |
|
| | ## ๐ Links |
| |
|
| | | Resource | Link | |
| | |---|---| |
| | | ๐ **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) | |
| | | ๐ฎ **Live Demo** | [GameTheory-Solver-Demo](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) | |
| | | ๐ **Base Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | |
| |
|
| | --- |
| |
|
| | ## ๐ License |
| |
|
| | This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). |
| |
|
| | ## ๐ Citation |
| |
|
| | ```bibtex |
| | @misc{gametheory-solver-2025, |
| | title = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory}, |
| | author = {2reb}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | url = {https://huggingface.co/2reb/GameTheory-Solver} |
| | } |
| | ``` |
| |
|