File size: 8,678 Bytes
eb51fe2
 
 
 
 
4d54513
 
eb51fe2
 
4d54513
 
 
 
 
eb51fe2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d54513
 
 
 
 
eb51fe2
4d54513
eb51fe2
 
 
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
 
 
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
 
 
 
 
 
 
 
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
eb51fe2
4d54513
 
 
 
eb51fe2
 
4d54513
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb51fe2
 
 
 
 
 
 
 
 
 
 
 
 
 
4d54513
eb51fe2
 
 
 
 
 
 
4d54513
eb51fe2
 
 
 
 
 
 
 
 
 
 
 
 
4d54513
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb51fe2
 
4d54513
 
 
eb51fe2
 
 
 
 
 
 
 
4d54513
 
 
 
 
 
 
 
 
 
 
eb51fe2
4d54513
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb51fe2
4d54513
eb51fe2
 
 
4d54513
eb51fe2
 
 
4d54513
 
 
 
 
eb51fe2
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
---
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: peft
license: apache-2.0
pipeline_tag: text-generation
language:
- en
tags:
- game-theory
- qwen2.5
- qlora
- fine-tuning
- nash-equilibrium
- economics
- math
- reasoning
- lora
- sft
- transformers
- trl
- 4-bit
- bitsandbytes
datasets:
- 2reb/GameTheory-Bench
model-index:
- name: GameTheory-Solver
  results:
  - task:
      type: text-generation
      name: Game Theory Problem Solving
    dataset:
      name: GameTheory-Bench
      type: 2reb/GameTheory-Bench
    metrics:
    - name: Overall Accuracy
      type: accuracy
      value: 94.0
      verified: false
    - name: Hard Problem Accuracy
      type: accuracy
      value: 94.4
      verified: false
---

# ๐ŸŽฏ GameTheory-Solver

**A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.**

[![Dataset](https://img.shields.io/badge/๐Ÿค—_Dataset-GameTheory--Bench-yellow)](https://huggingface.co/datasets/2reb/GameTheory-Bench)
[![Demo](https://img.shields.io/badge/๐ŸŽฎ_Demo-Try_it_Live-blue)](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

---

## ๐Ÿ“‹ Model Description

GameTheory-Solver is a **LoRA adapter** trained on the [GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) dataset โ€” the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers.

**Key result:** The fine-tuned model achieves **94% overall accuracy** (up from 82% base) and **94.4% on hard problems** (up from 66.7% base), representing a **+12pp overall** and **+27.7pp hard-problem improvement**.

### ๐Ÿง  Capabilities

| Capability | Details |
|---|---|
| **Nash Equilibrium Computation** | Pure and mixed strategies for 2ร—2, 3ร—3, 3ร—4, and 4ร—4 games |
| **Dominant Strategy Analysis** | IESDS (Iterated Elimination of Strictly Dominated Strategies) |
| **Zero-Sum Game Solving** | Minimax theorem, saddle point detection, mixed strategies |
| **Sequential Game Analysis** | Backward induction, subgame perfect equilibrium (up to 3 stages) |
| **Bayesian Game Equilibria** | Incomplete information, BNE, signaling games |
| **Cooperative Game Theory** | Shapley value computation, core analysis |
| **Auction Theory** | First-price, second-price (Vickrey), all-pay, revenue equivalence |
| **Mechanism Design** | VCG mechanisms, incentive compatibility analysis |

---

## ๐Ÿ“Š Benchmark Results

Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels.

### Overall Performance: Base vs. Solver

| Metric | Base (Qwen2.5-7B) | **Solver (Fine-tuned)** | **ฮ” Improvement** |
|---|:---:|:---:|:---:|
| **Overall Accuracy** | 82% | **94%** | **+12%** โœ… |
| **Hard Problems** | 66.7% | **94.4%** | **+27.7%** ๐Ÿš€ |

### Per-Category Accuracy

| Category | Base | Solver | ฮ” | Trend |
|---|:---:|:---:|:---:|:---:|
| Normal Form 2ร—2 | 100% | 80% | โˆ’20% | ๐Ÿ“‰ |
| Normal Form 3ร—3 | 80% | 60% | โˆ’20% | ๐Ÿ“‰ |
| Normal Form 3ร—4 | 100% | 100% | โ€” | โžก๏ธ |
| Normal Form 4ร—4 | 100% | 100% | โ€” | โžก๏ธ |
| Zero-Sum | 100% | 100% | โ€” | โžก๏ธ |
| Sequential Game | 100% | 100% | โ€” | โžก๏ธ |
| Auction Theory | 80% | **100%** | +20% | ๐Ÿ“ˆ |
| Bayesian Game | 0% | **100%** | **+100%** | ๐Ÿš€ |
| Cooperative Game | 100% | 100% | โ€” | โžก๏ธ |
| Mechanism Design | 60% | **100%** | +40% | ๐Ÿ“ˆ |

> **Highlight:** The model achieves the most dramatic gains on previously unsolvable categories โ€” **Bayesian Games** (0% โ†’ 100%) and **Mechanism Design** (60% โ†’ 100%) โ€” while maintaining perfect scores across zero-sum, sequential, and cooperative games.

---

## ๐Ÿš€ Usage

### Installation

```bash
pip install transformers peft bitsandbytes accelerate torch
```

### Loading the Model

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Quantization config (matches training)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")
```

### Solving a Game Theory Problem

```python
messages = [
    {
        "role": "system",
        "content": (
            "You are a game theory expert. Solve the given problem "
            "step-by-step, showing all mathematical reasoning. "
            "Provide the final answer clearly."
        ),
    },
    {
        "role": "user",
        "content": (
            "Consider the following game:\n\n"
            "Player 1 \\ Player 2 | Left | Right\n"
            "--- | --- | ---\n"
            "Up | (3,1) | (0,0)\n"
            "Down | (1,1) | (2,3)\n\n"
            "Find all Nash Equilibria."
        ),
    },
]

inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
```

---

## ๐Ÿ‹๏ธ Training Details

### Base Model

| Parameter | Value |
|---|---|
| **Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
| **Total Parameters** | 7.6B |
| **Trainable Parameters** | 161M (2.1% of total) |

### Dataset

| Parameter | Value |
|---|---|
| **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) |
| **Train Split** | 2,767 examples |
| **Eval Split** | 146 examples (5% held out) |

### QLoRA Configuration

| Parameter | Value |
|---|---|
| LoRA rank (`r`) | 64 |
| LoRA alpha (`ฮฑ`) | 128 |
| LoRA dropout | 0.05 |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| Quantization | 4-bit NF4 with double quantization |
| Compute dtype | bfloat16 |

### Training Hyperparameters

| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch size (per device) | 2 |
| Gradient accumulation steps | 8 |
| Effective batch size | 16 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Weight decay | 0.01 |
| Max sequence length | 2,048 |
| Packing | Enabled |
| Optimizer | `paged_adamw_8bit` |
| Gradient checkpointing | Enabled |

### Training Results

| Metric | Value |
|---|---|
| **Train loss** | 0.1613 |
| **Eval loss** | 0.0873 |
| **Token accuracy** | 96.1% |
| Total steps | 135 |
| Training runtime | ~2 hours |
| **Hardware** | 2ร— NVIDIA RTX 3090 (24 GB each) |

---

## โš ๏ธ Limitations

- **Small-matrix regression:** Accuracy on 2ร—2 and 3ร—3 normal-form games decreased after fine-tuning (100% โ†’ 80% and 80% โ†’ 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones.
- **Mixed-strategy precision:** Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues.
- **Context length:** Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions.
- **Synthetic training data:** The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting.

---

## ๐Ÿ”— Links

| Resource | Link |
|---|---|
| ๐Ÿ“Š **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) |
| ๐ŸŽฎ **Live Demo** | [GameTheory-Solver-Demo](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) |
| ๐Ÿ  **Base Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |

---

## ๐Ÿ“„ License

This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

## ๐Ÿ“ Citation

```bibtex
@misc{gametheory-solver-2025,
  title   = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
  author  = {2reb},
  year    = {2025},
  publisher = {Hugging Face},
  url     = {https://huggingface.co/2reb/GameTheory-Solver}
}
```