--- base_model: Qwen/Qwen2.5-7B-Instruct library_name: peft license: apache-2.0 pipeline_tag: text-generation language: - en tags: - game-theory - qwen2.5 - qlora - fine-tuning - nash-equilibrium - economics - math - reasoning - lora - sft - transformers - trl - 4-bit - bitsandbytes datasets: - 2reb/GameTheory-Bench model-index: - name: GameTheory-Solver results: - task: type: text-generation name: Game Theory Problem Solving dataset: name: GameTheory-Bench type: 2reb/GameTheory-Bench metrics: - name: Overall Accuracy type: accuracy value: 94.0 verified: false - name: Hard Problem Accuracy type: accuracy value: 94.4 verified: false --- # 🎯 GameTheory-Solver **A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.** [![Dataset](https://img.shields.io/badge/🤗_Dataset-GameTheory--Bench-yellow)](https://huggingface.co/datasets/2reb/GameTheory-Bench) [![Demo](https://img.shields.io/badge/🎮_Demo-Try_it_Live-blue)](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0) --- ## 📋 Model Description GameTheory-Solver is a **LoRA adapter** trained on the [GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) dataset — the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers. **Key result:** The fine-tuned model achieves **94% overall accuracy** (up from 82% base) and **94.4% on hard problems** (up from 66.7% base), representing a **+12pp overall** and **+27.7pp hard-problem improvement**. ### 🧠 Capabilities | Capability | Details | |---|---| | **Nash Equilibrium Computation** | Pure and mixed strategies for 2×2, 3×3, 3×4, and 4×4 games | | **Dominant Strategy Analysis** | IESDS (Iterated Elimination of Strictly Dominated Strategies) | | **Zero-Sum Game Solving** | Minimax theorem, saddle point detection, mixed strategies | | **Sequential Game Analysis** | Backward induction, subgame perfect equilibrium (up to 3 stages) | | **Bayesian Game Equilibria** | Incomplete information, BNE, signaling games | | **Cooperative Game Theory** | Shapley value computation, core analysis | | **Auction Theory** | First-price, second-price (Vickrey), all-pay, revenue equivalence | | **Mechanism Design** | VCG mechanisms, incentive compatibility analysis | --- ## 📊 Benchmark Results Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels. ### Overall Performance: Base vs. Solver | Metric | Base (Qwen2.5-7B) | **Solver (Fine-tuned)** | **Δ Improvement** | |---|:---:|:---:|:---:| | **Overall Accuracy** | 82% | **94%** | **+12%** ✅ | | **Hard Problems** | 66.7% | **94.4%** | **+27.7%** 🚀 | ### Per-Category Accuracy | Category | Base | Solver | Δ | Trend | |---|:---:|:---:|:---:|:---:| | Normal Form 2×2 | 100% | 80% | −20% | 📉 | | Normal Form 3×3 | 80% | 60% | −20% | 📉 | | Normal Form 3×4 | 100% | 100% | — | ➡️ | | Normal Form 4×4 | 100% | 100% | — | ➡️ | | Zero-Sum | 100% | 100% | — | ➡️ | | Sequential Game | 100% | 100% | — | ➡️ | | Auction Theory | 80% | **100%** | +20% | 📈 | | Bayesian Game | 0% | **100%** | **+100%** | 🚀 | | Cooperative Game | 100% | 100% | — | ➡️ | | Mechanism Design | 60% | **100%** | +40% | 📈 | > **Highlight:** The model achieves the most dramatic gains on previously unsolvable categories — **Bayesian Games** (0% → 100%) and **Mechanism Design** (60% → 100%) — while maintaining perfect scores across zero-sum, sequential, and cooperative games. --- ## 🚀 Usage ### Installation ```bash pip install transformers peft bitsandbytes accelerate torch ``` ### Loading the Model ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel # Quantization config (matches training) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) # Load base model + adapter base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-7B-Instruct", quantization_config=bnb_config, device_map="auto", ) model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver") tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver") ``` ### Solving a Game Theory Problem ```python messages = [ { "role": "system", "content": ( "You are a game theory expert. Solve the given problem " "step-by-step, showing all mathematical reasoning. " "Provide the final answer clearly." ), }, { "role": "user", "content": ( "Consider the following game:\n\n" "Player 1 \\ Player 2 | Left | Right\n" "--- | --- | ---\n" "Up | (3,1) | (0,0)\n" "Down | (1,1) | (2,3)\n\n" "Find all Nash Equilibria." ), }, ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to(model.device) with torch.no_grad(): outputs = model.generate(inputs, max_new_tokens=512, do_sample=False) response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True) print(response) ``` --- ## 🏋️ Training Details ### Base Model | Parameter | Value | |---|---| | **Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | | **Total Parameters** | 7.6B | | **Trainable Parameters** | 161M (2.1% of total) | ### Dataset | Parameter | Value | |---|---| | **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) | | **Train Split** | 2,767 examples | | **Eval Split** | 146 examples (5% held out) | ### QLoRA Configuration | Parameter | Value | |---|---| | LoRA rank (`r`) | 64 | | LoRA alpha (`α`) | 128 | | LoRA dropout | 0.05 | | Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` | | Quantization | 4-bit NF4 with double quantization | | Compute dtype | bfloat16 | ### Training Hyperparameters | Parameter | Value | |---|---| | Epochs | 3 | | Batch size (per device) | 2 | | Gradient accumulation steps | 8 | | Effective batch size | 16 | | Learning rate | 2e-4 | | LR scheduler | Cosine | | Warmup ratio | 0.05 | | Weight decay | 0.01 | | Max sequence length | 2,048 | | Packing | Enabled | | Optimizer | `paged_adamw_8bit` | | Gradient checkpointing | Enabled | ### Training Results | Metric | Value | |---|---| | **Train loss** | 0.1613 | | **Eval loss** | 0.0873 | | **Token accuracy** | 96.1% | | Total steps | 135 | | Training runtime | ~2 hours | | **Hardware** | 2× NVIDIA RTX 3090 (24 GB each) | --- ## ⚠️ Limitations - **Small-matrix regression:** Accuracy on 2×2 and 3×3 normal-form games decreased after fine-tuning (100% → 80% and 80% → 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones. - **Mixed-strategy precision:** Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues. - **Context length:** Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions. - **Synthetic training data:** The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting. --- ## 🔗 Links | Resource | Link | |---|---| | 📊 **Dataset** | [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) | | 🎮 **Live Demo** | [GameTheory-Solver-Demo](https://huggingface.co/spaces/2reb/GameTheory-Solver-Demo) | | 🏠 **Base Model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | --- ## 📄 License This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## 📝 Citation ```bibtex @misc{gametheory-solver-2025, title = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory}, author = {2reb}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/2reb/GameTheory-Solver} } ```