README.md · Alogotron/GameTheory-Solver at main

2reb

Upload README.md with huggingface_hub

4d54513 verified 7 days ago

8.68 kB

	---
	base_model: Qwen/Qwen2.5-7B-Instruct
	library_name: peft
	license: apache-2.0
	pipeline_tag: text-generation
	language:
	- en
	tags:
	- game-theory
	- qwen2.5
	- qlora
	- fine-tuning
	- nash-equilibrium
	- economics
	- math
	- reasoning
	- lora
	- sft
	- transformers
	- trl
	- 4-bit
	- bitsandbytes
	datasets:
	- 2reb/GameTheory-Bench
	model-index:
	- name: GameTheory-Solver
	results:
	- task:
	type: text-generation
	name: Game Theory Problem Solving
	dataset:
	name: GameTheory-Bench
	type: 2reb/GameTheory-Bench
	metrics:
	- name: Overall Accuracy
	type: accuracy
	value: 94.0
	verified: false
	- name: Hard Problem Accuracy
	type: accuracy
	value: 94.4
	verified: false
	---

	# 🎯 GameTheory-Solver

	A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.

	[![Dataset](https://img.shields.io/badge/🤗_Dataset-GameTheory--Bench-yellow)](https://huggingface.co/datasets/2reb/GameTheory-Bench)
	[![Demo](https://img.shields.io/badge/🎮_Demo-Try_it_Live-blue)](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo)
	[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

	---

	## 📋 Model Description

	GameTheory-Solver is a LoRA adapter trained on the [GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) dataset — the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers.

	Key result: The fine-tuned model achieves 94% overall accuracy (up from 82% base) and 94.4% on hard problems (up from 66.7% base), representing a +12pp overall and +27.7pp hard-problem improvement.

	### 🧠 Capabilities

	\| Capability \| Details \|
	\|---\|---\|
	\| Nash Equilibrium Computation \| Pure and mixed strategies for 2×2, 3×3, 3×4, and 4×4 games \|
	\| Dominant Strategy Analysis \| IESDS (Iterated Elimination of Strictly Dominated Strategies) \|
	\| Zero-Sum Game Solving \| Minimax theorem, saddle point detection, mixed strategies \|
	\| Sequential Game Analysis \| Backward induction, subgame perfect equilibrium (up to 3 stages) \|
	\| Bayesian Game Equilibria \| Incomplete information, BNE, signaling games \|
	\| Cooperative Game Theory \| Shapley value computation, core analysis \|
	\| Auction Theory \| First-price, second-price (Vickrey), all-pay, revenue equivalence \|
	\| Mechanism Design \| VCG mechanisms, incentive compatibility analysis \|

	---

	## 📊 Benchmark Results

	Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels.

	### Overall Performance: Base vs. Solver

	\| Metric \| Base (Qwen2.5-7B) \| Solver (Fine-tuned) \| Δ Improvement \|
	\|---\|:---:\|:---:\|:---:\|
	\| Overall Accuracy \| 82% \| 94% \| +12% ✅ \|
	\| Hard Problems \| 66.7% \| 94.4% \| +27.7% 🚀 \|

	### Per-Category Accuracy

	\| Category \| Base \| Solver \| Δ \| Trend \|
	\|---\|:---:\|:---:\|:---:\|:---:\|
	\| Normal Form 2×2 \| 100% \| 80% \| −20% \| 📉 \|
	\| Normal Form 3×3 \| 80% \| 60% \| −20% \| 📉 \|
	\| Normal Form 3×4 \| 100% \| 100% \| — \| ➡️ \|
	\| Normal Form 4×4 \| 100% \| 100% \| — \| ➡️ \|
	\| Zero-Sum \| 100% \| 100% \| — \| ➡️ \|
	\| Sequential Game \| 100% \| 100% \| — \| ➡️ \|
	\| Auction Theory \| 80% \| 100% \| +20% \| 📈 \|
	\| Bayesian Game \| 0% \| 100% \| +100% \| 🚀 \|
	\| Cooperative Game \| 100% \| 100% \| — \| ➡️ \|
	\| Mechanism Design \| 60% \| 100% \| +40% \| 📈 \|

	> Highlight: The model achieves the most dramatic gains on previously unsolvable categories — Bayesian Games (0% → 100%) and Mechanism Design (60% → 100%) — while maintaining perfect scores across zero-sum, sequential, and cooperative games.

	---

	## 🚀 Usage

	### Installation

	```bash
	pip install transformers peft bitsandbytes accelerate torch
	```

	### Loading the Model

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel

	# Quantization config (matches training)
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True,
	)

	# Load base model + adapter
	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-7B-Instruct",
	quantization_config=bnb_config,
	device_map="auto",
	)
	model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
	tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")
	```

	### Solving a Game Theory Problem

	```python
	messages = [
	{
	"role": "system",
	"content": (
	"You are a game theory expert. Solve the given problem "
	"step-by-step, showing all mathematical reasoning. "
	"Provide the final answer clearly."
	),
	},
	{
	"role": "user",
	"content": (
	"Consider the following game:\n\n"
	"Player 1 \\ Player 2 \| Left \| Right\n"
	"--- \| --- \| ---\n"
	"Up \| (3,1) \| (0,0)\n"
	"Down \| (1,1) \| (2,3)\n\n"
	"Find all Nash Equilibria."
	),
	},
	]

	inputs = tokenizer.apply_chat_template(
	messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
	).to(model.device)

	with torch.no_grad():
	outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)

	response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	---

	## 🏋️ Training Details

	### Base Model

	\| Parameter \| Value \|
	\|---\|---\|
	\| Model \| [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) \|
	\| Total Parameters \| 7.6B \|
	\| Trainable Parameters \| 161M (2.1% of total) \|

	### Dataset

	\| Parameter \| Value \|
	\|---\|---\|
	\| Dataset \| [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) \|
	\| Train Split \| 2,767 examples \|
	\| Eval Split \| 146 examples (5% held out) \|

	### QLoRA Configuration

	\| Parameter \| Value \|
	\|---\|---\|
	\| LoRA rank (`r`) \| 64 \|
	\| LoRA alpha (`α`) \| 128 \|
	\| LoRA dropout \| 0.05 \|
	\| Target modules \| `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` \|
	\| Quantization \| 4-bit NF4 with double quantization \|
	\| Compute dtype \| bfloat16 \|

	### Training Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Epochs \| 3 \|
	\| Batch size (per device) \| 2 \|
	\| Gradient accumulation steps \| 8 \|
	\| Effective batch size \| 16 \|
	\| Learning rate \| 2e-4 \|
	\| LR scheduler \| Cosine \|
	\| Warmup ratio \| 0.05 \|
	\| Weight decay \| 0.01 \|
	\| Max sequence length \| 2,048 \|
	\| Packing \| Enabled \|
	\| Optimizer \| `paged_adamw_8bit` \|
	\| Gradient checkpointing \| Enabled \|

	### Training Results

	\| Metric \| Value \|
	\|---\|---\|
	\| Train loss \| 0.1613 \|
	\| Eval loss \| 0.0873 \|
	\| Token accuracy \| 96.1% \|
	\| Total steps \| 135 \|
	\| Training runtime \| ~2 hours \|
	\| Hardware \| 2× NVIDIA RTX 3090 (24 GB each) \|

	---

	## ⚠️ Limitations

	- Small-matrix regression: Accuracy on 2×2 and 3×3 normal-form games decreased after fine-tuning (100% → 80% and 80% → 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones.
	- Mixed-strategy precision: Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues.
	- Context length: Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions.
	- Synthetic training data: The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting.

	---

	## 🔗 Links

	\| Resource \| Link \|
	\|---\|---\|
	\| 📊 Dataset \| [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) \|
	\| 🎮 Live Demo \| [GameTheory-Solver-Demo](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) \|
	\| 🏠 Base Model \| [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) \|

	---

	## 📄 License

	This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

	## 📝 Citation

	```bibtex
	@misc{gametheory-solver-2025,
	title = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
	author = {2reb},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/2reb/GameTheory-Solver}
	}
	```