GameTheory-Solver / README.md
2reb
Upload README.md with huggingface_hub
4d54513 verified
---
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: peft
license: apache-2.0
pipeline_tag: text-generation
language:
- en
tags:
- game-theory
- qwen2.5
- qlora
- fine-tuning
- nash-equilibrium
- economics
- math
- reasoning
- lora
- sft
- transformers
- trl
- 4-bit
- bitsandbytes
datasets:
- 2reb/GameTheory-Bench
model-index:
- name: GameTheory-Solver
results:
- task:
type: text-generation
name: Game Theory Problem Solving
dataset:
name: GameTheory-Bench
type: 2reb/GameTheory-Bench
metrics:
- name: Overall Accuracy
type: accuracy
value: 94.0
verified: false
- name: Hard Problem Accuracy
type: accuracy
value: 94.4
verified: false
---
# ๐ŸŽฏ GameTheory-Solver
**A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.**
[![Dataset](https://img.shields.io/badge/๐Ÿค—_Dataset-GameTheory--Bench-yellow)](https://huggingface.co/datasets/2reb/GameTheory-Bench)
[![Demo](https://img.shields.io/badge/๐ŸŽฎ_Demo-Try_it_Live-blue)](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
---
## ๐Ÿ“‹ Model Description
GameTheory-Solver is a **LoRA adapter** trained on the [GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) dataset โ€” the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers.
**Key result:** The fine-tuned model achieves **94% overall accuracy** (up from 82% base) and **94.4% on hard problems** (up from 66.7% base), representing a **+12pp overall** and **+27.7pp hard-problem improvement**.
### ๐Ÿง  Capabilities
| Capability | Details |
|---|---|
| **Nash Equilibrium Computation** | Pure and mixed strategies for 2ร—2, 3ร—3, 3ร—4, and 4ร—4 games |
| **Dominant Strategy Analysis** | IESDS (Iterated Elimination of Strictly Dominated Strategies) |
| **Zero-Sum Game Solving** | Minimax theorem, saddle point detection, mixed strategies |
| **Sequential Game Analysis** | Backward induction, subgame perfect equilibrium (up to 3 stages) |
| **Bayesian Game Equilibria** | Incomplete information, BNE, signaling games |
| **Cooperative Game Theory** | Shapley value computation, core analysis |
| **Auction Theory** | First-price, second-price (Vickrey), all-pay, revenue equivalence |
| **Mechanism Design** | VCG mechanisms, incentive compatibility analysis |
---
## ๐Ÿ“Š Benchmark Results
Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels.
### Overall Performance: Base vs. Solver
| Metric | Base (Qwen2.5-7B) | **Solver (Fine-tuned)** | **ฮ” Improvement** |
|---|:---:|:---:|:---:|
| **Overall Accuracy** | 82% | **94%** | **+12%** โœ… |
| **Hard Problems** | 66.7% | **94.4%** | **+27.7%** ๐Ÿš€ |
### Per-Category Accuracy
| Category | Base | Solver | ฮ” | Trend |
|---|:---:|:---:|:---:|:---:|
| Normal Form 2ร—2 | 100% | 80% | โˆ’20% | ๐Ÿ“‰ |
| Normal Form 3ร—3 | 80% | 60% | โˆ’20% | ๐Ÿ“‰ |
| Normal Form 3ร—4 | 100% | 100% | โ€” | โžก๏ธ |
| Normal Form 4ร—4 | 100% | 100% | โ€” | โžก๏ธ |
| Zero-Sum | 100% | 100% | โ€” | โžก๏ธ |
| Sequential Game | 100% | 100% | โ€” | โžก๏ธ |
| Auction Theory | 80% | **100%** | +20% | ๐Ÿ“ˆ |
| Bayesian Game | 0% | **100%** | **+100%** | ๐Ÿš€ |
| Cooperative Game | 100% | 100% | โ€” | โžก๏ธ |
| Mechanism Design | 60% | **100%** | +40% | ๐Ÿ“ˆ |
> **Highlight:** The model achieves the most dramatic gains on previously unsolvable categories โ€” **Bayesian Games** (0% โ†’ 100%) and **Mechanism Design** (60% โ†’ 100%) โ€” while maintaining perfect scores across zero-sum, sequential, and cooperative games.
---
## ๐Ÿš€ Usage
### Installation
```bash
pip install transformers peft bitsandbytes accelerate torch
```
### Loading the Model
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Quantization config (matches training)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B-Instruct",
quantization_config=bnb_config,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")
```
### Solving a Game Theory Problem
```python
messages = [
{
"role": "system",
"content": (
"You are a game theory expert. Solve the given problem "
"step-by-step, showing all mathematical reasoning. "
"Provide the final answer clearly."
),
},
{
"role": "user",
"content": (
"Consider the following game:\n\n"
"Player 1 \\ Player 2 | Left | Right\n"
"--- | --- | ---\n"
"Up | (3,1) | (0,0)\n"
"Down | (1,1) | (2,3)\n\n"
"Find all Nash Equilibria."
),
},
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
```
---
## ๐Ÿ‹๏ธ Training Details
### Base Model
| Parameter | Value |
|---|---|
| **Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
| **Total Parameters** | 7.6B |
| **Trainable Parameters** | 161M (2.1% of total) |
### Dataset
| Parameter | Value |
|---|---|
| **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) |
| **Train Split** | 2,767 examples |
| **Eval Split** | 146 examples (5% held out) |
### QLoRA Configuration
| Parameter | Value |
|---|---|
| LoRA rank (`r`) | 64 |
| LoRA alpha (`ฮฑ`) | 128 |
| LoRA dropout | 0.05 |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| Quantization | 4-bit NF4 with double quantization |
| Compute dtype | bfloat16 |
### Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch size (per device) | 2 |
| Gradient accumulation steps | 8 |
| Effective batch size | 16 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Weight decay | 0.01 |
| Max sequence length | 2,048 |
| Packing | Enabled |
| Optimizer | `paged_adamw_8bit` |
| Gradient checkpointing | Enabled |
### Training Results
| Metric | Value |
|---|---|
| **Train loss** | 0.1613 |
| **Eval loss** | 0.0873 |
| **Token accuracy** | 96.1% |
| Total steps | 135 |
| Training runtime | ~2 hours |
| **Hardware** | 2ร— NVIDIA RTX 3090 (24 GB each) |
---
## โš ๏ธ Limitations
- **Small-matrix regression:** Accuracy on 2ร—2 and 3ร—3 normal-form games decreased after fine-tuning (100% โ†’ 80% and 80% โ†’ 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones.
- **Mixed-strategy precision:** Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues.
- **Context length:** Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions.
- **Synthetic training data:** The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting.
---
## ๐Ÿ”— Links
| Resource | Link |
|---|---|
| ๐Ÿ“Š **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) |
| ๐ŸŽฎ **Live Demo** | [GameTheory-Solver-Demo](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) |
| ๐Ÿ  **Base Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
---
## ๐Ÿ“„ License
This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
## ๐Ÿ“ Citation
```bibtex
@misc{gametheory-solver-2025,
title = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
author = {2reb},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/2reb/GameTheory-Solver}
}
```