2reb

Upload README.md with huggingface_hub

4d54513 verified 7 days ago

8.68 kB

base_model: Qwen/Qwen2.5-7B-Instruct
library_name: peft
license: apache-2.0
pipeline_tag: text-generation
language:
  - en
tags:
  - game-theory
  - qwen2.5
  - qlora
  - fine-tuning
  - nash-equilibrium
  - economics
  - math
  - reasoning
  - lora
  - sft
  - transformers
  - trl
  - 4-bit
  - bitsandbytes
datasets:
  - 2reb/GameTheory-Bench
model-index:
  - name: GameTheory-Solver
    results:
      - task:
          type: text-generation
          name: Game Theory Problem Solving
        dataset:
          name: GameTheory-Bench
          type: 2reb/GameTheory-Bench
        metrics:
          - name: Overall Accuracy
            type: accuracy
            value: 94
            verified: false
          - name: Hard Problem Accuracy
            type: accuracy
            value: 94.4
            verified: false

🎯 GameTheory-Solver

A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.

📋 Model Description

GameTheory-Solver is a LoRA adapter trained on the GameTheory-Bench dataset — the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers.

Key result: The fine-tuned model achieves 94% overall accuracy (up from 82% base) and 94.4% on hard problems (up from 66.7% base), representing a +12pp overall and +27.7pp hard-problem improvement.

🧠 Capabilities

Capability	Details
Nash Equilibrium Computation	Pure and mixed strategies for 2×2, 3×3, 3×4, and 4×4 games
Dominant Strategy Analysis	IESDS (Iterated Elimination of Strictly Dominated Strategies)
Zero-Sum Game Solving	Minimax theorem, saddle point detection, mixed strategies
Sequential Game Analysis	Backward induction, subgame perfect equilibrium (up to 3 stages)
Bayesian Game Equilibria	Incomplete information, BNE, signaling games
Cooperative Game Theory	Shapley value computation, core analysis
Auction Theory	First-price, second-price (Vickrey), all-pay, revenue equivalence
Mechanism Design	VCG mechanisms, incentive compatibility analysis

📊 Benchmark Results

Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels.

Overall Performance: Base vs. Solver

Metric	Base (Qwen2.5-7B)	Solver (Fine-tuned)	Δ Improvement
Overall Accuracy	82%	94%	+12% ✅
Hard Problems	66.7%	94.4%	+27.7% 🚀

Per-Category Accuracy

Category	Base	Solver	Δ	Trend
Normal Form 2×2	100%	80%	−20%	📉
Normal Form 3×3	80%	60%	−20%	📉
Normal Form 3×4	100%	100%	—	➡️
Normal Form 4×4	100%	100%	—	➡️
Zero-Sum	100%	100%	—	➡️
Sequential Game	100%	100%	—	➡️
Auction Theory	80%	100%	+20%	📈
Bayesian Game	0%	100%	+100%	🚀
Cooperative Game	100%	100%	—	➡️
Mechanism Design	60%	100%	+40%	📈

Highlight: The model achieves the most dramatic gains on previously unsolvable categories — Bayesian Games (0% → 100%) and Mechanism Design (60% → 100%) — while maintaining perfect scores across zero-sum, sequential, and cooperative games.

🚀 Usage

Installation

pip install transformers peft bitsandbytes accelerate torch

Loading the Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Quantization config (matches training)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")

Solving a Game Theory Problem

messages = [
    {
        "role": "system",
        "content": (
            "You are a game theory expert. Solve the given problem "
            "step-by-step, showing all mathematical reasoning. "
            "Provide the final answer clearly."
        ),
    },
    {
        "role": "user",
        "content": (
            "Consider the following game:\n\n"
            "Player 1 \\ Player 2 | Left | Right\n"
            "--- | --- | ---\n"
            "Up | (3,1) | (0,0)\n"
            "Down | (1,1) | (2,3)\n\n"
            "Find all Nash Equilibria."
        ),
    },
]

inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

🏋️ Training Details

Base Model

Parameter	Value
Model	Qwen/Qwen2.5-7B-Instruct
Total Parameters	7.6B
Trainable Parameters	161M (2.1% of total)

Dataset

Parameter	Value
Dataset	2reb/GameTheory-Bench
Train Split	2,767 examples
Eval Split	146 examples (5% held out)

QLoRA Configuration

Parameter	Value
LoRA rank (`r`)	64
LoRA alpha (`α`)	128
LoRA dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Quantization	4-bit NF4 with double quantization
Compute dtype	bfloat16

Training Hyperparameters

Parameter	Value
Epochs	3
Batch size (per device)	2
Gradient accumulation steps	8
Effective batch size	16
Learning rate	2e-4
LR scheduler	Cosine
Warmup ratio	0.05
Weight decay	0.01
Max sequence length	2,048
Packing	Enabled
Optimizer	`paged_adamw_8bit`
Gradient checkpointing	Enabled

Training Results

Metric	Value
Train loss	0.1613
Eval loss	0.0873
Token accuracy	96.1%
Total steps	135
Training runtime	~2 hours
Hardware	2× NVIDIA RTX 3090 (24 GB each)

⚠️ Limitations

Small-matrix regression: Accuracy on 2×2 and 3×3 normal-form games decreased after fine-tuning (100% → 80% and 80% → 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones.
Mixed-strategy precision: Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues.
Context length: Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions.
Synthetic training data: The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting.

🔗 Links

Resource	Link
📊 Dataset	2reb/GameTheory-Bench
🎮 Live Demo	GameTheory-Solver-Demo
🏠 Base Model	Qwen/Qwen2.5-7B-Instruct

📄 License

This adapter is released under the Apache 2.0 License.

📝 Citation

@misc{gametheory-solver-2025,
  title   = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
  author  = {2reb},
  year    = {2025},
  publisher = {Hugging Face},
  url     = {https://huggingface.co/2reb/GameTheory-Solver}
}