PanzerBread commited on
Commit
4cef487
·
verified ·
1 Parent(s): 1641c45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +256 -124
README.md CHANGED
@@ -1,185 +1,317 @@
1
  ---
2
- language: en
3
- license: apache-2.0
 
4
  tags:
5
- - math
6
- - reasoning
7
- - synthetic-data
8
  - promptcot
 
9
  - mathematical-reasoning
10
- - olympiad-math
11
- - em-training
12
- - concept-guided
13
- inference: false
14
  ---
15
 
16
- # PromptCoT: Synthetic Dataset Generation for Reasoning Models
17
 
18
- **PromptCoT** is an innovative approach to generating high-quality synthetic datasets for mathematical and coding reasoning models through concept-guided problem synthesis and iterative refinement.
19
 
20
- ## Model Description
21
 
22
- This model collection implements the PromptCoT framework, which consists of two complementary models trained through an Expectation-Maximization (EM) loop:
23
 
24
- ### Rationale Model (qφ)
25
 
26
- - **Purpose**: Generates optimal step-by-step thinking plans (rationales) for solving mathematical problems
27
- - **Input**: Mathematical concepts + problem statement
28
- - **Output**: Detailed reasoning strategy/rationale
29
- - **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
30
 
31
- ### Prompt Model (pθ)
32
 
33
- - **Purpose**: Creates challenging mathematical problems from concepts and rationales
34
- - **Input**: Mathematical concepts + reasoning strategy
35
- - **Output**: Olympiad-level mathematical problem
36
- - **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
37
 
38
- ## Intended Uses & Limitations
 
 
 
 
39
 
40
- ### Intended Uses
41
 
42
- - **Synthetic Dataset Generation**: Create high-quality training data for mathematical reasoning models
43
- - **Educational Content**: Generate practice problems for mathematics education
44
- - **Research**: Study concept-guided problem synthesis and reasoning patterns
45
- - **Model Training**: Improve mathematical reasoning capabilities of language models
46
 
47
- ### Limitations
48
 
49
- - **Mathematical Focus**: Currently specialized for Olympiad-level mathematics problems
50
- - **Training Data**: Limited to concepts and problems from AIME competitions
51
- - **Computational Requirements**: Requires significant GPU resources for training and inference
52
- - **Quality Dependency**: Output quality depends on the quality of seed data and training iterations
53
 
54
- ## Training Details
55
 
56
- ### Training Data
 
57
 
58
- - **Seed Dataset**: 253 high-quality (concept, rationale, problem) triples from AIME 2024/2025
59
- - **Data Source**: American Invitational Mathematics Examination (AIME) problems
60
- - **Annotation**: GPT-4 assisted extraction of concepts and rationale generation
61
 
62
- ### Training Procedure
 
 
63
 
64
- 1. **Cold Start**: Initial fine-tuning on seed triples
65
- 2. **EM Loop**: Iterative improvement through:
66
- - **E-step**: Generate multiple rationales, compute rewards using model likelihood
67
- - **M-step**: Fine-tune models on selected high-reward triples
68
- 3. **Reward Function**: `reward = -loss_rationale(c,z) - loss_prompt(c+z,x)`
69
 
70
- ### Training Hyperparameters
 
 
71
 
72
- - **Base Model**: Qwen/Qwen2.5-7B (base, not instruct)
73
- - **LoRA Rank**: 64
74
- - **Target Modules**: Attention projection matrices
75
- - **EM Iterations**: 6
76
- - **Batch Size**: 16 (effective 160 with gradient accumulation)
77
- - **Learning Rate**: Adam optimizer with default settings
78
- - **Sampling**: 7-4 rationales per iteration (decreasing curriculum)
79
 
80
- ## Technical Specifications
81
 
82
- ### Model Architecture
83
 
84
- ```python
85
- # Rationale Model Input Format
86
- input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
87
 
88
- # Prompt Model Input Format
89
- input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
90
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
- ### Generation Parameters
93
 
94
- - **Temperature**: 0.7
95
- - **Top-p**: 0.9
96
- - **Max New Tokens**: 512-1024 (depending on model)
97
- - **Sampling**: Enabled for diversity
98
 
99
- ## Usage Examples
100
 
101
- ### Using the Rationale Model
 
 
 
 
 
 
102
 
103
  ```python
104
- from transformers import AutoModelForCausalLM, AutoTokenizer
105
  from peft import PeftModel
 
106
 
107
- # Load model
108
- model = PeftModel.from_pretrained(
109
- AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
110
- "PanzerBread/PromptCoT",
111
- subfolder="coding-0.1/q/latest"
112
  )
113
- tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
114
 
115
- # Generate rationale
116
- concepts = ["exponents", "modular arithmetic"]
117
- problem = "Find the smallest odd prime factor of 2019^8 + 1."
118
- input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
119
-
120
- inputs = tokenizer(input_text, return_tensors="pt")
121
- outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
122
- rationale = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Rationale:")[-1]
123
  ```
124
 
125
- ### Using the Prompt Model
126
 
127
  ```python
128
- # Load prompt model
129
- model = PeftModel.from_pretrained(
130
- AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
131
- "PanzerBread/PromptCoT",
132
- subfolder="coding-0.1/p/latest"
133
- )
134
 
135
- # Generate problem
136
- concepts = ["combinatorial probability", "divisibility arguments"]
137
- rationale = "Select lottery scenario... ensure summation techniques..."
138
- input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
 
 
 
 
 
139
 
140
- inputs = tokenizer(input_text, return_tensors="pt")
141
- outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
142
- problem = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Problem:")[-1]
143
  ```
144
 
145
- ## Performance & Evaluation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
 
147
- ### Quality Metrics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
 
149
- - **Structure Accuracy**: Models maintain proper (concepts, rationale, problem) format
150
- - **Reward Improvement**: EM loop increases average rewards across iterations
151
- - **Diversity**: Multiple rationale generation ensures varied problem types
152
 
153
- ### Benchmark Results
154
 
155
- - Trained on Olympiad-level mathematical problems
156
- - Generates problems comparable to AIME difficulty
157
- - Maintains mathematical correctness and coherence
158
 
159
- ## Ethical Considerations
160
 
161
- ### Benefits
162
 
163
- - **Accessibility**: Democratizes access to high-quality mathematical problems
164
- - **Education**: Provides unlimited practice material for mathematics education
165
- - **Research**: Accelerates development of mathematical reasoning AI
166
 
167
- ### Risks & Mitigation
168
 
169
- - **Misinformation**: Generated problems may contain subtle errors
170
- - _Mitigation_: Extensive validation and human oversight recommended
171
- - **Over-reliance**: Should complement, not replace, human-created educational content
172
- - **Bias**: Limited to mathematical domains present in training data
173
- - _Mitigation_: Expand training data diversity for broader applicability
174
 
175
- Based on the PromptCoT paper: https://arxiv.org/pdf/2509.19894
176
 
177
- ## Contact & Support
178
 
179
- - **Repository**: https://github.com/PanzerBread/PromptCoT
180
- - **Issues**: https://github.com/PanzerBread/PromptCoT/issues
181
- - **Model Hub**: https://huggingface.co/PanzerBread/PromptCoT
182
 
183
- ## License
184
 
185
- This model is released under the Apache 2.0 License. See LICENSE file for details.
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
  tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-7B-Instruct
7
+ - lora
8
+ - transformers
9
  - promptcot
10
+ - chain-of-thought
11
  - mathematical-reasoning
 
 
 
 
12
  ---
13
 
14
+ # PromptCoT 2.0 - Prompt Model (pθ)
15
 
16
+ This is the **Prompt Model (pθ)** from the PromptCoT 2.0 implementation, trained using Expectation-Maximization (EM) algorithm to generate challenging mathematical problems given concepts and rationales.
17
 
18
+ ## Model Details
19
 
20
+ ### Model Description
21
 
22
+ This model is part of a dual-model system implementing PromptCoT 2.0:
23
 
24
+ - **pθ (Prompt Model)**: Generates problems `x` given concepts `c` and rationale `z` → `p(x|z,c)`
25
+ - **qφ (Rationale Model)**: Generates rationales `z` given concepts `c` and problem `x` → `q(z|c,x)`
 
 
26
 
27
+ The models are trained iteratively using an EM loop:
28
 
29
+ 1. **E-step**: Generate K=8 rationale candidates, compute rewards, select best
30
+ 2. **M-step**: Fine-tune both models on selected (concept, rationale, problem) triples
 
 
31
 
32
+ - **Developed by:** [Your Name/Organization]
33
+ - **Model type:** LoRA fine-tuned Causal Language Model
34
+ - **Language(s):** English (mathematical reasoning)
35
+ - **License:** Apache 2.0 (inherited from Qwen2.5-7B-Instruct)
36
+ - **Finetuned from:** Qwen/Qwen2.5-7B-Instruct
37
 
38
+ ### Model Sources
39
 
40
+ - **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
41
+ - **Paper:** [PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning](https://arxiv.org/abs/2509.19894) (arXiv:2509.19894)
42
+ - **Authors:** Xueliang Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong
43
+ - **Related Model:** [PromptCoT Rationale Model (qφ)](https://huggingface.co/PanzerBread/promptcot-q)
44
 
45
+ ## Uses
46
 
47
+ ### Direct Use
 
 
 
48
 
49
+ This model is designed to generate challenging mathematical problems given:
50
 
51
+ - **Input format**: `Concepts: c1 | c2 | ...\nRationale: [rationale text]\nProblem:`
52
+ - **Output**: Mathematical problem text
53
 
54
+ **Example:**
 
 
55
 
56
+ ```python
57
+ from peft import PeftModel
58
+ from transformers import AutoModelForCausalLM, AutoTokenizer
59
 
60
+ base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
61
+ model = PeftModel.from_pretrained(base_model, "PanzerBread/promptcot-p")
62
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
 
 
63
 
64
+ concepts = "algebra | quadratic equations"
65
+ rationale = "We need to find the roots of a quadratic equation..."
66
+ prompt = f"Concepts: {concepts}\nRationale: {rationale}\nProblem:"
67
 
68
+ inputs = tokenizer(prompt, return_tensors="pt")
69
+ outputs = model.generate(**inputs, max_new_tokens=256)
70
+ problem = tokenizer.decode(outputs[0], skip_special_tokens=True)
71
+ ```
 
 
 
72
 
73
+ ### Downstream Use
74
 
75
+ This model is part of the PromptCoT 2.0 EM training loop. Use it together with the rationale model (qφ) to:
76
 
77
+ - Generate synthetic training data for mathematical reasoning
78
+ - Improve problem-solving capabilities through iterative refinement
79
+ - Create challenging problem sets for educational purposes
80
 
81
+ ### Out-of-Scope Use
82
+
83
+ This model is specialized for mathematical reasoning and may not perform well for:
84
+
85
+ - General conversational tasks
86
+ - Non-mathematical problem generation
87
+ - Tasks requiring external knowledge beyond mathematical concepts
88
+
89
+ ## Bias, Risks, and Limitations
90
+
91
+ ### Known Limitations
92
+
93
+ - **Domain Specificity**: This model is trained specifically for mathematical reasoning and may not generalize well to other domains
94
+ - **Training Data Bias**: The model inherits biases from the seed dataset (AIME, GSM8K, Math500), which may reflect specific mathematical problem styles
95
+ - **EM Convergence**: The EM algorithm may converge to local optima, depending on initialization and hyperparameters
96
+ - **Generated Quality**: Generated problems may require manual validation for correctness and appropriateness
97
+
98
+ ### Technical Limitations
99
+
100
+ - **Context Length**: Limited to 512 tokens during EM training (2048 for cold start)
101
+ - **Sampling**: Uses temperature sampling (T=0.7) which may produce diverse but sometimes inconsistent outputs
102
+ - **Reward Function**: The reward is based on log probabilities, which may not perfectly correlate with problem quality
103
+
104
+ ### Recommendations
105
 
106
+ Users should:
107
 
108
+ 1. **Validate Outputs**: Always verify generated problems for mathematical correctness
109
+ 2. **Use with Rationale Model**: This model works best when paired with the rationale model (qφ) in the full EM loop
110
+ 3. **Monitor Training**: Check WandB logs for reward trends and training stability
111
+ 4. **Iterative Refinement**: The EM process requires multiple iterations for best results
112
 
113
+ ## How to Get Started with the Model
114
 
115
+ ### Installation
116
+
117
+ ```bash
118
+ pip install transformers peft torch
119
+ ```
120
+
121
+ ### Loading the Model
122
 
123
  ```python
124
+ import torch
125
  from peft import PeftModel
126
+ from transformers import AutoModelForCausalLM, AutoTokenizer
127
 
128
+ # Load base model
129
+ base_model = AutoModelForCausalLM.from_pretrained(
130
+ "Qwen/Qwen2.5-7B-Instruct",
131
+ torch_dtype=torch.bfloat16,
132
+ device_map="auto"
133
  )
 
134
 
135
+ # Load LoRA adapters
136
+ model = PeftModel.from_pretrained(base_model, "PanzerBread/promptcot-p")
137
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
138
+ tokenizer.pad_token = tokenizer.eos_token
 
 
 
 
139
  ```
140
 
141
+ ### Generating Problems
142
 
143
  ```python
144
+ concepts = "algebra | quadratic equations | factoring"
145
+ rationale = "To solve this problem, we need to factor the quadratic equation and find its roots..."
 
 
 
 
146
 
147
+ prompt = f"Concepts: {concepts}\nRationale: {rationale}\nProblem:"
148
+
149
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
150
+ outputs = model.generate(
151
+ **inputs,
152
+ max_new_tokens=256,
153
+ temperature=0.7,
154
+ do_sample=True
155
+ )
156
 
157
+ problem = tokenizer.decode(outputs[0], skip_special_tokens=True)
158
+ print(problem.split("Problem:")[-1].strip())
 
159
  ```
160
 
161
+ ## Training Details
162
+
163
+ ### Training Data
164
+
165
+ **Seed Dataset:**
166
+
167
+ - 253 concept-rationale-problem triples from:
168
+ - AIME 2024/2025
169
+ - GSM8K
170
+ - Math500
171
+ - Format: `(concepts: List[str], rationale: str, problem: str)`
172
+
173
+ **Training Process:**
174
+
175
+ 1. **Cold Start**: Warm-start both models via Maximum Likelihood Estimation (MLE) on seed dataset
176
+ 2. **EM Loop**: Iterative refinement through 10 EM iterations
177
+ - Each iteration generates K=8 rationale candidates per problem
178
+ - Selects best candidate based on reward function
179
+ - Fine-tunes both models on selected triples
180
+
181
+ ### Training Procedure
182
+
183
+ #### Preprocessing
184
+
185
+ - Tokenization: Left-padding, max_length=512 (EM loop) / 2048 (cold start)
186
+ - Format: `Concepts: c1 | c2 | ...\nRationale: z\nProblem: x`
187
+ - Masked cross-entropy loss (only tokens after "Problem:" keyword)
188
+
189
+ #### Training Hyperparameters
190
+
191
+ - **Training regime:** bfloat16 mixed precision
192
+ - **LoRA Configuration:**
193
+ - `r=64` (rank)
194
+ - `lora_alpha=16`
195
+ - `lora_dropout=0.05`
196
+ - Target modules: `["q_proj", "k_proj", "v_proj", "o_proj"]`
197
+ - **EM Loop:**
198
+ - Batch size: 16
199
+ - K samples: 8 rationale candidates per problem
200
+ - Learning rate: 2e-5 (inferred from Trainer defaults)
201
+ - Epochs per M-step: 1
202
+ - **Reward Function:**
203
+ ```
204
+ R(c,x,z) = log p(x|z,c) + log p(z|c)
205
+ ```
206
+ Where log probabilities are computed as negative cross-entropy loss.
207
+
208
+ #### Speeds, Sizes, Times
209
+
210
+ - **Model Size:** ~7B parameters (base) + ~0.02B (LoRA adapters)
211
+ - **Hardware:** H200 GPU (141 GB VRAM)
212
+ - **Training Time:** ~X hours per EM iteration (depending on dataset size)
213
+
214
+ ## Evaluation
215
+
216
+ ### Testing Data, Factors & Metrics
217
+
218
+ #### Testing Data
219
+
220
+ - Seed dataset: 253 triples (training/validation split if applicable)
221
+ - Generated data: Synthetic problems created during EM iterations
222
+
223
+ #### Metrics
224
+
225
+ - **Reward Score**: Average reward per iteration (R(c,x,z) = log p(x|z,c) + log p(z|c))
226
+ - **Training Loss**: Cross-entropy loss on selected triples
227
+ - **Rationale Quality**: Measured through reward-based selection
228
+
229
+ ### Results
230
+
231
+ Training progress is monitored via WandB:
232
+
233
+ - E-step reward statistics (avg, max, min)
234
+ - M-step training losses for both models
235
+ - Number of triples selected per iteration
236
+
237
+ **Note:** This is an ongoing training process. Final evaluation results will be updated upon completion of all EM iterations.
238
+
239
+ #### Summary
240
 
241
+ The model is trained using PromptCoT 2.0's EM algorithm, which iteratively improves both problem generation (pθ) and rationale generation (qφ) capabilities through reward-based selection.
242
+
243
+ ## Model Examination [optional]
244
+
245
+ <!-- Relevant interpretability work for the model goes here -->
246
+
247
+ [More Information Needed]
248
+
249
+ ## Technical Specifications
250
+
251
+ ### Model Architecture and Objective
252
+
253
+ - **Base Architecture:** Qwen2.5-7B-Instruct (Transformer decoder)
254
+ - **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
255
+ - **Objective:** Causal language modeling with masked cross-entropy
256
+ - **Task:** Generate problems `x` given concepts `c` and rationale `z`
257
+
258
+ ### Compute Infrastructure
259
+
260
+ #### Hardware
261
+
262
+ - **Training:** NVIDIA H200 GPU (141 GB VRAM)
263
+ - **Inference:** Compatible with any GPU supporting bfloat16
264
+
265
+ #### Software
266
+
267
+ - **Framework:** PyTorch 2.0+
268
+ - **Libraries:**
269
+ - transformers
270
+ - peft (v0.17.1+)
271
+ - datasets
272
+ - wandb (for logging)
273
+ - **CUDA:** Compatible with CUDA 11.8+
274
+
275
+ ## Citation
276
+
277
+ If you use this model, please cite the PromptCoT 2.0 paper:
278
+
279
+ **BibTeX:**
280
+
281
+ ```bibtex
282
+ @article{zhao2025promptcot2,
283
+ title={PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
284
+ author={Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
285
+ journal={arXiv preprint arXiv:2509.19894},
286
+ year={2025}
287
+ }
288
+ ```
289
 
290
+ **APA:**
291
+ Zhao, X., Wu, W., Guan, J., Gong, Z., & Kong, L. (2025). PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning. _arXiv preprint arXiv:2509.19894_.
 
292
 
293
+ **Paper Link:** [https://arxiv.org/abs/2509.19894](https://arxiv.org/abs/2509.19894)
294
 
295
+ ## Glossary [optional]
 
 
296
 
297
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
298
 
299
+ [More Information Needed]
300
 
301
+ ## More Information [optional]
 
 
302
 
303
+ [More Information Needed]
304
 
305
+ ## Model Card Authors
 
 
 
 
306
 
307
+ [Your Name/Organization]
308
 
309
+ ## Model Card Contact
310
 
311
+ [Your Email/Contact]
 
 
312
 
313
+ ### Framework versions
314
 
315
+ - PEFT 0.17.1
316
+ - transformers 4.40.0+
317
+ - torch 2.0+