README.md · Alogotron/GameTheory-Solver at main

File size: 8,678 Bytes

---
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: peft
license: apache-2.0
pipeline_tag: text-generation
language:
- en
tags:
- game-theory
- qwen2.5
- qlora
- fine-tuning
- nash-equilibrium
- economics
- math
- reasoning
- lora
- sft
- transformers
- trl
- 4-bit
- bitsandbytes
datasets:
- 2reb/GameTheory-Bench
model-index:
- name: GameTheory-Solver
  results:
  - task:
      type: text-generation
      name: Game Theory Problem Solving
    dataset:
      name: GameTheory-Bench
      type: 2reb/GameTheory-Bench
    metrics:
    - name: Overall Accuracy
      type: accuracy
      value: 94.0
      verified: false
    - name: Hard Problem Accuracy
      type: accuracy
      value: 94.4
      verified: false
---

# 🎯 GameTheory-Solver

**A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.**

[![Dataset](https://img.shields.io/badge/🤗_Dataset-GameTheory--Bench-yellow)](https://huggingface.co/datasets/2reb/GameTheory-Bench)
[![Demo](https://img.shields.io/badge/🎮_Demo-Try_it_Live-blue)](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

---

## 📋 Model Description

GameTheory-Solver is a **LoRA adapter** trained on the [GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) dataset — the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers.

**Key result:** The fine-tuned model achieves **94% overall accuracy** (up from 82% base) and **94.4% on hard problems** (up from 66.7% base), representing a **+12pp overall** and **+27.7pp hard-problem improvement**.

### 🧠 Capabilities

| Capability | Details |
|---|---|
| **Nash Equilibrium Computation** | Pure and mixed strategies for 2×2, 3×3, 3×4, and 4×4 games |
| **Dominant Strategy Analysis** | IESDS (Iterated Elimination of Strictly Dominated Strategies) |
| **Zero-Sum Game Solving** | Minimax theorem, saddle point detection, mixed strategies |
| **Sequential Game Analysis** | Backward induction, subgame perfect equilibrium (up to 3 stages) |
| **Bayesian Game Equilibria** | Incomplete information, BNE, signaling games |
| **Cooperative Game Theory** | Shapley value computation, core analysis |
| **Auction Theory** | First-price, second-price (Vickrey), all-pay, revenue equivalence |
| **Mechanism Design** | VCG mechanisms, incentive compatibility analysis |

---

## 📊 Benchmark Results

Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels.

### Overall Performance: Base vs. Solver

| Metric | Base (Qwen2.5-7B) | **Solver (Fine-tuned)** | **Δ Improvement** |
|---|:---:|:---:|:---:|
| **Overall Accuracy** | 82% | **94%** | **+12%** ✅ |
| **Hard Problems** | 66.7% | **94.4%** | **+27.7%** 🚀 |

### Per-Category Accuracy

| Category | Base | Solver | Δ | Trend |
|---|:---:|:---:|:---:|:---:|
| Normal Form 2×2 | 100% | 80% | −20% | 📉 |
| Normal Form 3×3 | 80% | 60% | −20% | 📉 |
| Normal Form 3×4 | 100% | 100% | — | ➡️ |
| Normal Form 4×4 | 100% | 100% | — | ➡️ |
| Zero-Sum | 100% | 100% | — | ➡️ |
| Sequential Game | 100% | 100% | — | ➡️ |
| Auction Theory | 80% | **100%** | +20% | 📈 |
| Bayesian Game | 0% | **100%** | **+100%** | 🚀 |
| Cooperative Game | 100% | 100% | — | ➡️ |
| Mechanism Design | 60% | **100%** | +40% | 📈 |

> **Highlight:** The model achieves the most dramatic gains on previously unsolvable categories — **Bayesian Games** (0% → 100%) and **Mechanism Design** (60% → 100%) — while maintaining perfect scores across zero-sum, sequential, and cooperative games.

---

## 🚀 Usage

### Installation

```bash
pip install transformers peft bitsandbytes accelerate torch
```

### Loading the Model

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Quantization config (matches training)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")
```

### Solving a Game Theory Problem

```python
messages = [
    {
        "role": "system",
        "content": (
            "You are a game theory expert. Solve the given problem "
            "step-by-step, showing all mathematical reasoning. "
            "Provide the final answer clearly."
        ),
    },
    {
        "role": "user",
        "content": (
            "Consider the following game:\n\n"
            "Player 1 \\ Player 2 | Left | Right\n"
            "--- | --- | ---\n"
            "Up | (3,1) | (0,0)\n"
            "Down | (1,1) | (2,3)\n\n"
            "Find all Nash Equilibria."
        ),
    },
]

inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
```

---

## 🏋️ Training Details

### Base Model

| Parameter | Value |
|---|---|
| **Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
| **Total Parameters** | 7.6B |
| **Trainable Parameters** | 161M (2.1% of total) |

### Dataset

| Parameter | Value |
|---|---|
| **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) |
| **Train Split** | 2,767 examples |
| **Eval Split** | 146 examples (5% held out) |

### QLoRA Configuration

| Parameter | Value |
|---|---|
| LoRA rank (`r`) | 64 |
| LoRA alpha (`α`) | 128 |
| LoRA dropout | 0.05 |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| Quantization | 4-bit NF4 with double quantization |
| Compute dtype | bfloat16 |

### Training Hyperparameters

| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch size (per device) | 2 |
| Gradient accumulation steps | 8 |
| Effective batch size | 16 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Weight decay | 0.01 |
| Max sequence length | 2,048 |
| Packing | Enabled |
| Optimizer | `paged_adamw_8bit` |
| Gradient checkpointing | Enabled |

### Training Results

| Metric | Value |
|---|---|
| **Train loss** | 0.1613 |
| **Eval loss** | 0.0873 |
| **Token accuracy** | 96.1% |
| Total steps | 135 |
| Training runtime | ~2 hours |
| **Hardware** | 2× NVIDIA RTX 3090 (24 GB each) |

---

## ⚠️ Limitations

- **Small-matrix regression:** Accuracy on 2×2 and 3×3 normal-form games decreased after fine-tuning (100% → 80% and 80% → 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones.
- **Mixed-strategy precision:** Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues.
- **Context length:** Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions.
- **Synthetic training data:** The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting.

---

## 🔗 Links

| Resource | Link |
|---|---|
| 📊 **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) |
| 🎮 **Live Demo** | [GameTheory-Solver-Demo](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) |
| 🏠 **Base Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |

---

## 📄 License

This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

## 📝 Citation

```bibtex
@misc{gametheory-solver-2025,
  title   = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
  author  = {2reb},
  year    = {2025},
  publisher = {Hugging Face},
  url     = {https://huggingface.co/2reb/GameTheory-Solver}
}
```