2reb commited on
Commit
eb51fe2
·
verified ·
1 Parent(s): 1b82de3

Upload GameTheory-Solver QLoRA adapter with evaluation results

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,215 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - game-theory
8
+ - math
9
+ - reasoning
10
+ - lora
11
+ - qlora
12
+ - sft
13
+ - qwen2
14
+ - transformers
15
+ - trl
16
+ - 4-bit
17
+ - bitsandbytes
18
+ datasets:
19
+ - 2reb/GameTheory-Bench
20
+ model-index:
21
+ - name: GameTheory-Solver
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Game Theory Problem Solving
26
+ dataset:
27
+ name: GameTheory-Bench
28
+ type: 2reb/GameTheory-Bench
29
+ metrics:
30
+ - name: Accuracy
31
+ type: accuracy
32
+ value: 80.0
33
+ verified: false
34
+ ---
35
+
36
+ # GameTheory-Solver
37
+
38
+ A QLoRA fine-tuned adapter for **Qwen/Qwen2.5-7B-Instruct** specialized in solving game theory problems with step-by-step mathematical reasoning.
39
+
40
+ ## Model Description
41
+
42
+ GameTheory-Solver is a LoRA adapter trained on the [GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench) dataset, which contains 2,913 diverse game theory problems spanning 10 categories. The model generates detailed, step-by-step solutions with mathematical proofs and clear final answers.
43
+
44
+ ### Capabilities
45
+
46
+ - **Nash Equilibrium computation** (pure and mixed strategies) for 2x2, 3x3, 3x4, and 4x4 games
47
+ - **Dominant strategy analysis** and Iterated Elimination of Strictly Dominated Strategies (IESDS)
48
+ - **Zero-sum game solving** with minimax theorem and saddle point detection
49
+ - **Sequential game analysis** via backward induction (up to 3 stages)
50
+ - **Bayesian game equilibria** with incomplete information
51
+ - **Cooperative game theory** including Shapley value computation
52
+ - **Auction theory** (first-price, second-price, all-pay, revenue equivalence)
53
+ - **Mechanism design** and incentive compatibility analysis
54
+
55
+ ## Training Details
56
+
57
+ ### Base Model
58
+ - **Model**: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
59
+ - **Parameters**: 7.6B (base), 161M trainable (LoRA)
60
+
61
+ ### Dataset
62
+ - **Dataset**: [2reb/GameTheory-Bench](https://huggingface.co/datasets/2reb/GameTheory-Bench)
63
+ - **Train split**: 2,767 examples
64
+ - **Eval split**: 146 examples (5% held out)
65
+
66
+ ### QLoRA Configuration
67
+ | Parameter | Value |
68
+ |---|---|
69
+ | LoRA rank (r) | 64 |
70
+ | LoRA alpha | 128 |
71
+ | LoRA dropout | 0.05 |
72
+ | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
73
+ | Quantization | 4-bit NF4 with double quantization |
74
+ | Compute dtype | bfloat16 |
75
+ | Trainable parameters | 161M (2.1% of total) |
76
+
77
+ ### Training Hyperparameters
78
+ | Parameter | Value |
79
+ |---|---|
80
+ | Epochs | 3 |
81
+ | Batch size (per device) | 2 |
82
+ | Gradient accumulation steps | 8 |
83
+ | Effective batch size | 16 |
84
+ | Learning rate | 2e-4 |
85
+ | LR scheduler | Cosine |
86
+ | Warmup ratio | 0.05 |
87
+ | Weight decay | 0.01 |
88
+ | Max sequence length | 2048 |
89
+ | Packing | Enabled |
90
+ | Optimizer | paged_adamw_8bit |
91
+ | Gradient checkpointing | Enabled |
92
+ | Precision | bf16 |
93
+
94
+ ### Training Results
95
+ | Metric | Value |
96
+ |---|---|
97
+ | Train loss | 0.1613 |
98
+ | Eval loss | 0.0873 |
99
+ | Token accuracy | 96.1% |
100
+ | Total steps | 135 |
101
+ | Training runtime | 1h 55m |
102
+
103
+ ## Evaluation Results
104
+
105
+ Evaluated on 15 diverse problems sampled across all 10 categories and 3 difficulty levels.
106
+
107
+ ### Overall Performance
108
+ | Metric | Value |
109
+ |---|---|
110
+ | **Overall Accuracy** | **12/15 (80.0%)** |
111
+ | Avg generation time | 24.7s per problem |
112
+ | Avg output tokens | 322 tokens |
113
+
114
+ ### Per-Category Accuracy
115
+ | Category | Correct/Total | Accuracy |
116
+ |---|---|---|
117
+ | auction_theory | 2/2 | 100.0% |
118
+ | bayesian_game | 0/1 | 0.0% |
119
+ | cooperative_game | 0/1 | 0.0% |
120
+ | mechanism_design | 2/2 | 100.0% |
121
+ | normal_form_2x2 | 3/3 | 100.0% |
122
+ | normal_form_3x3 | 1/1 | 100.0% |
123
+ | normal_form_3x4 | 2/2 | 100.0% |
124
+ | normal_form_4x4 | 1/1 | 100.0% |
125
+ | sequential_game | 1/1 | 100.0% |
126
+ | zero_sum | 0/1 | 0.0% |
127
+
128
+ ### Per-Difficulty Accuracy
129
+ | Difficulty | Correct/Total | Accuracy |
130
+ |---|---|---|
131
+ | easy | 3/3 | 100.0% |
132
+ | medium | 4/6 | 66.7% |
133
+ | hard | 5/6 | 83.3% |
134
+
135
+ ### Sample Results
136
+ | Category | Subcategory | Difficulty | Result |
137
+ |---|---|---|---|
138
+ | normal_form_2x2 | random_extra | easy | CORRECT |
139
+ | normal_form_3x3 | 3x3_pure_ne | medium | CORRECT |
140
+ | normal_form_3x4 | 3x4_pure_ne | hard | CORRECT |
141
+ | normal_form_4x4 | 4x4_iesds | hard | CORRECT |
142
+ | zero_sum | minimax | medium | INCORRECT |
143
+
144
+ ## Usage
145
+
146
+ ### Installation
147
+
148
+ ```bash
149
+ pip install transformers peft bitsandbytes accelerate torch
150
+ ```
151
+
152
+ ### Loading the Model
153
+
154
+ ```python
155
+ import torch
156
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
157
+ from peft import PeftModel
158
+
159
+ # Load in 4-bit (same as training)
160
+ bnb_config = BitsAndBytesConfig(
161
+ load_in_4bit=True,
162
+ bnb_4bit_quant_type="nf4",
163
+ bnb_4bit_compute_dtype=torch.bfloat16,
164
+ bnb_4bit_use_double_quant=True,
165
+ )
166
+
167
+ base_model = AutoModelForCausalLM.from_pretrained(
168
+ "Qwen/Qwen2.5-7B-Instruct",
169
+ quantization_config=bnb_config,
170
+ device_map="auto",
171
+ )
172
+
173
+ model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
174
+ tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")
175
+ ```
176
+
177
+ ### Solving a Game Theory Problem
178
+
179
+ ```python
180
+ messages = [
181
+ {"role": "system", "content": "You are a game theory expert. Solve the given problem step-by-step, showing all mathematical reasoning. Provide the final answer clearly."},
182
+ {"role": "user", "content": "Consider the following game:\n\nPlayer 1 \\ Player 2 | Left | Right\n--- | --- | ---\nUp | (3,1) | (0,0)\nDown | (1,1) | (2,3)\n\nFind all Nash Equilibria."},
183
+ ]
184
+
185
+ inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
186
+
187
+ with torch.no_grad():
188
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
189
+
190
+ response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
191
+ print(response)
192
+ ```
193
+
194
+ ## Limitations
195
+
196
+ - Performance on **Bayesian games** and **cooperative games** (Shapley value) may be less reliable than on normal-form games
197
+ - Complex mixed-strategy Nash Equilibria with irrational numbers may have precision issues
198
+ - Maximum context of 2048 tokens may truncate very large game matrices
199
+ - The model was trained on synthetically generated problems; real-world game theory scenarios may differ
200
+
201
+ ## License
202
+
203
+ This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
204
+
205
+ ## Citation
206
+
207
+ ```bibtex
208
+ @misc{gametheory-solver-2025,
209
+ title={GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
210
+ author={2reb},
211
+ year={2025},
212
+ publisher={HuggingFace},
213
+ url={https://huggingface.co/2reb/GameTheory-Solver}
214
+ }
215
+ ```
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen2.5-7B-Instruct",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 128,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "o_proj",
33
+ "v_proj",
34
+ "down_proj",
35
+ "up_proj",
36
+ "gate_proj",
37
+ "q_proj",
38
+ "k_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:681f6ee09f57be855a5cce57a1ddbcee711cf1befc5bc4ac15b695d0942cdab2
3
+ size 645975704
chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
3
+ size 11421892
tokenizer_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": false,
24
+ "model_max_length": 131072,
25
+ "pad_token": "<|endoftext|>",
26
+ "split_special_tokens": false,
27
+ "tokenizer_class": "Qwen2Tokenizer",
28
+ "unk_token": null
29
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da56877e9478f0b041766a7794d67df1b222b095968deca6b97c805b4609fc25
3
+ size 5649
training_stats.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model": "Qwen/Qwen2.5-7B-Instruct",
3
+ "dataset": "2reb/GameTheory-Bench",
4
+ "train_examples": 2767,
5
+ "eval_examples": 146,
6
+ "lora_r": 64,
7
+ "lora_alpha": 128,
8
+ "epochs": 3,
9
+ "batch_size": 2,
10
+ "grad_accum": 8,
11
+ "effective_batch": 16,
12
+ "lr": 0.0002,
13
+ "train_loss": 0.1613485331888552,
14
+ "eval_loss": 0.08727391809225082,
15
+ "runtime_seconds": 6895.8492
16
+ }