PanzerBread commited on
Commit
1641c45
·
verified ·
1 Parent(s): fed4c4c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +185 -0
README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - math
6
+ - reasoning
7
+ - synthetic-data
8
+ - promptcot
9
+ - mathematical-reasoning
10
+ - olympiad-math
11
+ - em-training
12
+ - concept-guided
13
+ inference: false
14
+ ---
15
+
16
+ # PromptCoT: Synthetic Dataset Generation for Reasoning Models
17
+
18
+ **PromptCoT** is an innovative approach to generating high-quality synthetic datasets for mathematical and coding reasoning models through concept-guided problem synthesis and iterative refinement.
19
+
20
+ ## Model Description
21
+
22
+ This model collection implements the PromptCoT framework, which consists of two complementary models trained through an Expectation-Maximization (EM) loop:
23
+
24
+ ### Rationale Model (qφ)
25
+
26
+ - **Purpose**: Generates optimal step-by-step thinking plans (rationales) for solving mathematical problems
27
+ - **Input**: Mathematical concepts + problem statement
28
+ - **Output**: Detailed reasoning strategy/rationale
29
+ - **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
30
+
31
+ ### Prompt Model (pθ)
32
+
33
+ - **Purpose**: Creates challenging mathematical problems from concepts and rationales
34
+ - **Input**: Mathematical concepts + reasoning strategy
35
+ - **Output**: Olympiad-level mathematical problem
36
+ - **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
37
+
38
+ ## Intended Uses & Limitations
39
+
40
+ ### Intended Uses
41
+
42
+ - **Synthetic Dataset Generation**: Create high-quality training data for mathematical reasoning models
43
+ - **Educational Content**: Generate practice problems for mathematics education
44
+ - **Research**: Study concept-guided problem synthesis and reasoning patterns
45
+ - **Model Training**: Improve mathematical reasoning capabilities of language models
46
+
47
+ ### Limitations
48
+
49
+ - **Mathematical Focus**: Currently specialized for Olympiad-level mathematics problems
50
+ - **Training Data**: Limited to concepts and problems from AIME competitions
51
+ - **Computational Requirements**: Requires significant GPU resources for training and inference
52
+ - **Quality Dependency**: Output quality depends on the quality of seed data and training iterations
53
+
54
+ ## Training Details
55
+
56
+ ### Training Data
57
+
58
+ - **Seed Dataset**: 253 high-quality (concept, rationale, problem) triples from AIME 2024/2025
59
+ - **Data Source**: American Invitational Mathematics Examination (AIME) problems
60
+ - **Annotation**: GPT-4 assisted extraction of concepts and rationale generation
61
+
62
+ ### Training Procedure
63
+
64
+ 1. **Cold Start**: Initial fine-tuning on seed triples
65
+ 2. **EM Loop**: Iterative improvement through:
66
+ - **E-step**: Generate multiple rationales, compute rewards using model likelihood
67
+ - **M-step**: Fine-tune models on selected high-reward triples
68
+ 3. **Reward Function**: `reward = -loss_rationale(c,z) - loss_prompt(c+z,x)`
69
+
70
+ ### Training Hyperparameters
71
+
72
+ - **Base Model**: Qwen/Qwen2.5-7B (base, not instruct)
73
+ - **LoRA Rank**: 64
74
+ - **Target Modules**: Attention projection matrices
75
+ - **EM Iterations**: 6
76
+ - **Batch Size**: 16 (effective 160 with gradient accumulation)
77
+ - **Learning Rate**: Adam optimizer with default settings
78
+ - **Sampling**: 7-4 rationales per iteration (decreasing curriculum)
79
+
80
+ ## Technical Specifications
81
+
82
+ ### Model Architecture
83
+
84
+ ```python
85
+ # Rationale Model Input Format
86
+ input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
87
+
88
+ # Prompt Model Input Format
89
+ input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
90
+ ```
91
+
92
+ ### Generation Parameters
93
+
94
+ - **Temperature**: 0.7
95
+ - **Top-p**: 0.9
96
+ - **Max New Tokens**: 512-1024 (depending on model)
97
+ - **Sampling**: Enabled for diversity
98
+
99
+ ## Usage Examples
100
+
101
+ ### Using the Rationale Model
102
+
103
+ ```python
104
+ from transformers import AutoModelForCausalLM, AutoTokenizer
105
+ from peft import PeftModel
106
+
107
+ # Load model
108
+ model = PeftModel.from_pretrained(
109
+ AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
110
+ "PanzerBread/PromptCoT",
111
+ subfolder="coding-0.1/q/latest"
112
+ )
113
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
114
+
115
+ # Generate rationale
116
+ concepts = ["exponents", "modular arithmetic"]
117
+ problem = "Find the smallest odd prime factor of 2019^8 + 1."
118
+ input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
119
+
120
+ inputs = tokenizer(input_text, return_tensors="pt")
121
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
122
+ rationale = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Rationale:")[-1]
123
+ ```
124
+
125
+ ### Using the Prompt Model
126
+
127
+ ```python
128
+ # Load prompt model
129
+ model = PeftModel.from_pretrained(
130
+ AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
131
+ "PanzerBread/PromptCoT",
132
+ subfolder="coding-0.1/p/latest"
133
+ )
134
+
135
+ # Generate problem
136
+ concepts = ["combinatorial probability", "divisibility arguments"]
137
+ rationale = "Select lottery scenario... ensure summation techniques..."
138
+ input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
139
+
140
+ inputs = tokenizer(input_text, return_tensors="pt")
141
+ outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
142
+ problem = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Problem:")[-1]
143
+ ```
144
+
145
+ ## Performance & Evaluation
146
+
147
+ ### Quality Metrics
148
+
149
+ - **Structure Accuracy**: Models maintain proper (concepts, rationale, problem) format
150
+ - **Reward Improvement**: EM loop increases average rewards across iterations
151
+ - **Diversity**: Multiple rationale generation ensures varied problem types
152
+
153
+ ### Benchmark Results
154
+
155
+ - Trained on Olympiad-level mathematical problems
156
+ - Generates problems comparable to AIME difficulty
157
+ - Maintains mathematical correctness and coherence
158
+
159
+ ## Ethical Considerations
160
+
161
+ ### Benefits
162
+
163
+ - **Accessibility**: Democratizes access to high-quality mathematical problems
164
+ - **Education**: Provides unlimited practice material for mathematics education
165
+ - **Research**: Accelerates development of mathematical reasoning AI
166
+
167
+ ### Risks & Mitigation
168
+
169
+ - **Misinformation**: Generated problems may contain subtle errors
170
+ - _Mitigation_: Extensive validation and human oversight recommended
171
+ - **Over-reliance**: Should complement, not replace, human-created educational content
172
+ - **Bias**: Limited to mathematical domains present in training data
173
+ - _Mitigation_: Expand training data diversity for broader applicability
174
+
175
+ Based on the PromptCoT paper: https://arxiv.org/pdf/2509.19894
176
+
177
+ ## Contact & Support
178
+
179
+ - **Repository**: https://github.com/PanzerBread/PromptCoT
180
+ - **Issues**: https://github.com/PanzerBread/PromptCoT/issues
181
+ - **Model Hub**: https://huggingface.co/PanzerBread/PromptCoT
182
+
183
+ ## License
184
+
185
+ This model is released under the Apache 2.0 License. See LICENSE file for details.