PanzerBread
/

PromptCoT

+---
+language: en
+license: apache-2.0
+tags:
+  - math
+  - reasoning
+  - synthetic-data
+  - promptcot
+  - mathematical-reasoning
+  - olympiad-math
+  - em-training
+  - concept-guided
+inference: false
+---
+# PromptCoT: Synthetic Dataset Generation for Reasoning Models
+**PromptCoT** is an innovative approach to generating high-quality synthetic datasets for mathematical and coding reasoning models through concept-guided problem synthesis and iterative refinement.
+## Model Description
+This model collection implements the PromptCoT framework, which consists of two complementary models trained through an Expectation-Maximization (EM) loop:
+### Rationale Model (qφ)
+- **Purpose**: Generates optimal step-by-step thinking plans (rationales) for solving mathematical problems
+- **Input**: Mathematical concepts + problem statement
+- **Output**: Detailed reasoning strategy/rationale
+- **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
+### Prompt Model (pθ)
+- **Purpose**: Creates challenging mathematical problems from concepts and rationales
+- **Input**: Mathematical concepts + reasoning strategy
+- **Output**: Olympiad-level mathematical problem
+- **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
+## Intended Uses & Limitations
+### Intended Uses
+- **Synthetic Dataset Generation**: Create high-quality training data for mathematical reasoning models
+- **Educational Content**: Generate practice problems for mathematics education
+- **Research**: Study concept-guided problem synthesis and reasoning patterns
+- **Model Training**: Improve mathematical reasoning capabilities of language models
+### Limitations
+- **Mathematical Focus**: Currently specialized for Olympiad-level mathematics problems
+- **Training Data**: Limited to concepts and problems from AIME competitions
+- **Computational Requirements**: Requires significant GPU resources for training and inference
+- **Quality Dependency**: Output quality depends on the quality of seed data and training iterations
+## Training Details
+### Training Data
+- **Seed Dataset**: 253 high-quality (concept, rationale, problem) triples from AIME 2024/2025
+- **Data Source**: American Invitational Mathematics Examination (AIME) problems
+- **Annotation**: GPT-4 assisted extraction of concepts and rationale generation
+### Training Procedure
+1. **Cold Start**: Initial fine-tuning on seed triples
+2. **EM Loop**: Iterative improvement through:
+   - **E-step**: Generate multiple rationales, compute rewards using model likelihood
+   - **M-step**: Fine-tune models on selected high-reward triples
+3. **Reward Function**: `reward = -loss_rationale(c,z) - loss_prompt(c+z,x)`
+### Training Hyperparameters
+- **Base Model**: Qwen/Qwen2.5-7B (base, not instruct)
+- **LoRA Rank**: 64
+- **Target Modules**: Attention projection matrices
+- **EM Iterations**: 6
+- **Batch Size**: 16 (effective 160 with gradient accumulation)
+- **Learning Rate**: Adam optimizer with default settings
+- **Sampling**: 7-4 rationales per iteration (decreasing curriculum)
+## Technical Specifications
+### Model Architecture
+```python
+# Rationale Model Input Format
+input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
+# Prompt Model Input Format
+input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
+```
+### Generation Parameters
+- **Temperature**: 0.7
+- **Top-p**: 0.9
+- **Max New Tokens**: 512-1024 (depending on model)
+- **Sampling**: Enabled for diversity
+## Usage Examples
+### Using the Rationale Model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+# Load model
+model = PeftModel.from_pretrained(
+    AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
+    "PanzerBread/PromptCoT",
+    subfolder="coding-0.1/q/latest"
+)
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
+# Generate rationale
+concepts = ["exponents", "modular arithmetic"]
+problem = "Find the smallest odd prime factor of 2019^8 + 1."
+input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
+rationale = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Rationale:")[-1]
+```
+### Using the Prompt Model
+```python
+# Load prompt model
+model = PeftModel.from_pretrained(
+    AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
+    "PanzerBread/PromptCoT",
+    subfolder="coding-0.1/p/latest"
+)
+# Generate problem
+concepts = ["combinatorial probability", "divisibility arguments"]
+rationale = "Select lottery scenario... ensure summation techniques..."
+input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
+problem = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Problem:")[-1]
+```
+## Performance & Evaluation
+### Quality Metrics
+- **Structure Accuracy**: Models maintain proper (concepts, rationale, problem) format
+- **Reward Improvement**: EM loop increases average rewards across iterations
+- **Diversity**: Multiple rationale generation ensures varied problem types
+### Benchmark Results
+- Trained on Olympiad-level mathematical problems
+- Generates problems comparable to AIME difficulty
+- Maintains mathematical correctness and coherence
+## Ethical Considerations
+### Benefits
+- **Accessibility**: Democratizes access to high-quality mathematical problems
+- **Education**: Provides unlimited practice material for mathematics education
+- **Research**: Accelerates development of mathematical reasoning AI
+### Risks & Mitigation
+- **Misinformation**: Generated problems may contain subtle errors
+  - _Mitigation_: Extensive validation and human oversight recommended
+- **Over-reliance**: Should complement, not replace, human-created educational content
+- **Bias**: Limited to mathematical domains present in training data
+  - _Mitigation_: Expand training data diversity for broader applicability
+Based on the PromptCoT paper: https://arxiv.org/pdf/2509.19894
+## Contact & Support
+- **Repository**: https://github.com/PanzerBread/PromptCoT
+- **Issues**: https://github.com/PanzerBread/PromptCoT/issues
+- **Model Hub**: https://huggingface.co/PanzerBread/PromptCoT
+## License
+This model is released under the Apache 2.0 License. See LICENSE file for details.