Instructions to use PanzerBread/PromptCoT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PanzerBread/PromptCoT with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/deepseek-r1-distill-qwen-7b-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "PanzerBread/PromptCoT")

Transformers

How to use PanzerBread/PromptCoT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PanzerBread/PromptCoT")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("PanzerBread/PromptCoT", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PanzerBread/PromptCoT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PanzerBread/PromptCoT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PanzerBread/PromptCoT",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/PanzerBread/PromptCoT

SGLang

How to use PanzerBread/PromptCoT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PanzerBread/PromptCoT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PanzerBread/PromptCoT",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PanzerBread/PromptCoT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PanzerBread/PromptCoT",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use PanzerBread/PromptCoT with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PanzerBread/PromptCoT to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PanzerBread/PromptCoT to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for PanzerBread/PromptCoT to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="PanzerBread/PromptCoT",
    max_seq_length=2048,
)

Docker Model Runner
How to use PanzerBread/PromptCoT with Docker Model Runner:
```
docker model run hf.co/PanzerBread/PromptCoT
```

PanzerBread commited on Nov 16, 2025

Commit

4cef487

verified ·

1 Parent(s): 1641c45

Update README.md

Browse files

Files changed (1) hide show

README.md +256 -124

README.md CHANGED Viewed

@@ -1,185 +1,317 @@
 ---
-language: en
-license: apache-2.0
 tags:
-  - math
-  - reasoning
-  - synthetic-data
   - promptcot
   - mathematical-reasoning
-  - olympiad-math
-  - em-training
-  - concept-guided
-inference: false
 ---
-# PromptCoT: Synthetic Dataset Generation for Reasoning Models
-**PromptCoT** is an innovative approach to generating high-quality synthetic datasets for mathematical and coding reasoning models through concept-guided problem synthesis and iterative refinement.
-## Model Description
-This model collection implements the PromptCoT framework, which consists of two complementary models trained through an Expectation-Maximization (EM) loop:
-### Rationale Model (qφ)
-- **Purpose**: Generates optimal step-by-step thinking plans (rationales) for solving mathematical problems
-- **Input**: Mathematical concepts + problem statement
-- **Output**: Detailed reasoning strategy/rationale
-- **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
-### Prompt Model (pθ)
-- **Purpose**: Creates challenging mathematical problems from concepts and rationales
-- **Input**: Mathematical concepts + reasoning strategy
-- **Output**: Olympiad-level mathematical problem
-- **Architecture**: Qwen2.5-7B base model with LoRA fine-tuning (r=64)
-## Intended Uses & Limitations
-### Intended Uses
-- **Synthetic Dataset Generation**: Create high-quality training data for mathematical reasoning models
-- **Educational Content**: Generate practice problems for mathematics education
-- **Research**: Study concept-guided problem synthesis and reasoning patterns
-- **Model Training**: Improve mathematical reasoning capabilities of language models
-### Limitations
-- **Mathematical Focus**: Currently specialized for Olympiad-level mathematics problems
-- **Training Data**: Limited to concepts and problems from AIME competitions
-- **Computational Requirements**: Requires significant GPU resources for training and inference
-- **Quality Dependency**: Output quality depends on the quality of seed data and training iterations
-## Training Details
-### Training Data
-- **Seed Dataset**: 253 high-quality (concept, rationale, problem) triples from AIME 2024/2025
-- **Data Source**: American Invitational Mathematics Examination (AIME) problems
-- **Annotation**: GPT-4 assisted extraction of concepts and rationale generation
-### Training Procedure
-1. **Cold Start**: Initial fine-tuning on seed triples
-2. **EM Loop**: Iterative improvement through:
-   - **E-step**: Generate multiple rationales, compute rewards using model likelihood
-   - **M-step**: Fine-tune models on selected high-reward triples
-3. **Reward Function**: `reward = -loss_rationale(c,z) - loss_prompt(c+z,x)`
-### Training Hyperparameters
-- **Base Model**: Qwen/Qwen2.5-7B (base, not instruct)
-- **LoRA Rank**: 64
-- **Target Modules**: Attention projection matrices
-- **EM Iterations**: 6
-- **Batch Size**: 16 (effective 160 with gradient accumulation)
-- **Learning Rate**: Adam optimizer with default settings
-- **Sampling**: 7-4 rationales per iteration (decreasing curriculum)
-## Technical Specifications
-### Model Architecture
-```python
-# Rationale Model Input Format
-input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
-# Prompt Model Input Format
-input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
-```
-### Generation Parameters
-- **Temperature**: 0.7
-- **Top-p**: 0.9
-- **Max New Tokens**: 512-1024 (depending on model)
-- **Sampling**: Enabled for diversity
-## Usage Examples
-### Using the Rationale Model
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel
-# Load model
-model = PeftModel.from_pretrained(
-    AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
-    "PanzerBread/PromptCoT",
-    subfolder="coding-0.1/q/latest"
 )
-tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
-# Generate rationale
-concepts = ["exponents", "modular arithmetic"]
-problem = "Find the smallest odd prime factor of 2019^8 + 1."
-input_text = f"Concepts: {' | '.join(concepts)}\nProblem: {problem}\nRationale:"
-inputs = tokenizer(input_text, return_tensors="pt")
-outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
-rationale = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Rationale:")[-1]
 ```
-### Using the Prompt Model
 ```python
-# Load prompt model
-model = PeftModel.from_pretrained(
-    AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B"),
-    "PanzerBread/PromptCoT",
-    subfolder="coding-0.1/p/latest"
-)
-# Generate problem
-concepts = ["combinatorial probability", "divisibility arguments"]
-rationale = "Select lottery scenario... ensure summation techniques..."
-input_text = f"Concepts: {' | '.join(concepts)}\nRationale: {rationale}\nProblem:"
-inputs = tokenizer(input_text, return_tensors="pt")
-outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
-problem = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Problem:")[-1]
 ```
-## Performance & Evaluation
-### Quality Metrics
-- **Structure Accuracy**: Models maintain proper (concepts, rationale, problem) format
-- **Reward Improvement**: EM loop increases average rewards across iterations
-- **Diversity**: Multiple rationale generation ensures varied problem types
-### Benchmark Results
-- Trained on Olympiad-level mathematical problems
-- Generates problems comparable to AIME difficulty
-- Maintains mathematical correctness and coherence
-## Ethical Considerations
-### Benefits
-- **Accessibility**: Democratizes access to high-quality mathematical problems
-- **Education**: Provides unlimited practice material for mathematics education
-- **Research**: Accelerates development of mathematical reasoning AI
-### Risks & Mitigation
-- **Misinformation**: Generated problems may contain subtle errors
-  - _Mitigation_: Extensive validation and human oversight recommended
-- **Over-reliance**: Should complement, not replace, human-created educational content
-- **Bias**: Limited to mathematical domains present in training data
-  - _Mitigation_: Expand training data diversity for broader applicability
-Based on the PromptCoT paper: https://arxiv.org/pdf/2509.19894
-## Contact & Support
-- **Repository**: https://github.com/PanzerBread/PromptCoT
-- **Issues**: https://github.com/PanzerBread/PromptCoT/issues
-- **Model Hub**: https://huggingface.co/PanzerBread/PromptCoT
-## License
-This model is released under the Apache 2.0 License. See LICENSE file for details.

 ---
+base_model: Qwen/Qwen2.5-7B-Instruct
+library_name: peft
+pipeline_tag: text-generation
 tags:
+  - base_model:adapter:Qwen/Qwen2.5-7B-Instruct
+  - lora
+  - transformers
   - promptcot
+  - chain-of-thought
   - mathematical-reasoning
 ---
+# PromptCoT 2.0 - Prompt Model (pθ)
+This is the **Prompt Model (pθ)** from the PromptCoT 2.0 implementation, trained using Expectation-Maximization (EM) algorithm to generate challenging mathematical problems given concepts and rationales.
+## Model Details
+### Model Description
+This model is part of a dual-model system implementing PromptCoT 2.0:
+- **pθ (Prompt Model)**: Generates problems `x` given concepts `c` and rationale `z` → `p(x|z,c)`
+- **qφ (Rationale Model)**: Generates rationales `z` given concepts `c` and problem `x` → `q(z|c,x)`
+The models are trained iteratively using an EM loop:
+1. **E-step**: Generate K=8 rationale candidates, compute rewards, select best
+2. **M-step**: Fine-tune both models on selected (concept, rationale, problem) triples
+- **Developed by:** [Your Name/Organization]
+- **Model type:** LoRA fine-tuned Causal Language Model
+- **Language(s):** English (mathematical reasoning)
+- **License:** Apache 2.0 (inherited from Qwen2.5-7B-Instruct)
+- **Finetuned from:** Qwen/Qwen2.5-7B-Instruct
+### Model Sources
+- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
+- **Paper:** [PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning](https://arxiv.org/abs/2509.19894) (arXiv:2509.19894)
+- **Authors:** Xueliang Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong
+- **Related Model:** [PromptCoT Rationale Model (qφ)](https://huggingface.co/PanzerBread/promptcot-q)
+## Uses
+### Direct Use
+This model is designed to generate challenging mathematical problems given:
+- **Input format**: `Concepts: c1 | c2 | ...\nRationale: [rationale text]\nProblem:`
+- **Output**: Mathematical problem text
+**Example:**
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
+model = PeftModel.from_pretrained(base_model, "PanzerBread/promptcot-p")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
+concepts = "algebra | quadratic equations"
+rationale = "We need to find the roots of a quadratic equation..."
+prompt = f"Concepts: {concepts}\nRationale: {rationale}\nProblem:"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=256)
+problem = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+### Downstream Use
+This model is part of the PromptCoT 2.0 EM training loop. Use it together with the rationale model (qφ) to:
+- Generate synthetic training data for mathematical reasoning
+- Improve problem-solving capabilities through iterative refinement
+- Create challenging problem sets for educational purposes
+### Out-of-Scope Use
+This model is specialized for mathematical reasoning and may not perform well for:
+- General conversational tasks
+- Non-mathematical problem generation
+- Tasks requiring external knowledge beyond mathematical concepts
+## Bias, Risks, and Limitations
+### Known Limitations
+- **Domain Specificity**: This model is trained specifically for mathematical reasoning and may not generalize well to other domains
+- **Training Data Bias**: The model inherits biases from the seed dataset (AIME, GSM8K, Math500), which may reflect specific mathematical problem styles
+- **EM Convergence**: The EM algorithm may converge to local optima, depending on initialization and hyperparameters
+- **Generated Quality**: Generated problems may require manual validation for correctness and appropriateness
+### Technical Limitations
+- **Context Length**: Limited to 512 tokens during EM training (2048 for cold start)
+- **Sampling**: Uses temperature sampling (T=0.7) which may produce diverse but sometimes inconsistent outputs
+- **Reward Function**: The reward is based on log probabilities, which may not perfectly correlate with problem quality
+### Recommendations
+Users should:
+1. **Validate Outputs**: Always verify generated problems for mathematical correctness
+2. **Use with Rationale Model**: This model works best when paired with the rationale model (qφ) in the full EM loop
+3. **Monitor Training**: Check WandB logs for reward trends and training stability
+4. **Iterative Refinement**: The EM process requires multiple iterations for best results
+## How to Get Started with the Model
+### Installation
+```bash
+pip install transformers peft torch
+```
+### Loading the Model
 ```python
+import torch
 from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-7B-Instruct",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
 )
+# Load LoRA adapters
+model = PeftModel.from_pretrained(base_model, "PanzerBread/promptcot-p")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
+tokenizer.pad_token = tokenizer.eos_token
 ```
+### Generating Problems
 ```python
+concepts = "algebra | quadratic equations | factoring"
+rationale = "To solve this problem, we need to factor the quadratic equation and find its roots..."
+prompt = f"Concepts: {concepts}\nRationale: {rationale}\nProblem:"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=256,
+    temperature=0.7,
+    do_sample=True
+)
+problem = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(problem.split("Problem:")[-1].strip())
 ```
+## Training Details
+### Training Data
+**Seed Dataset:**
+- 253 concept-rationale-problem triples from:
+  - AIME 2024/2025
+  - GSM8K
+  - Math500
+- Format: `(concepts: List[str], rationale: str, problem: str)`
+**Training Process:**
+1. **Cold Start**: Warm-start both models via Maximum Likelihood Estimation (MLE) on seed dataset
+2. **EM Loop**: Iterative refinement through 10 EM iterations
+   - Each iteration generates K=8 rationale candidates per problem
+   - Selects best candidate based on reward function
+   - Fine-tunes both models on selected triples
+### Training Procedure
+#### Preprocessing
+- Tokenization: Left-padding, max_length=512 (EM loop) / 2048 (cold start)
+- Format: `Concepts: c1 | c2 | ...\nRationale: z\nProblem: x`
+- Masked cross-entropy loss (only tokens after "Problem:" keyword)
+#### Training Hyperparameters
+- **Training regime:** bfloat16 mixed precision
+- **LoRA Configuration:**
+  - `r=64` (rank)
+  - `lora_alpha=16`
+  - `lora_dropout=0.05`
+  - Target modules: `["q_proj", "k_proj", "v_proj", "o_proj"]`
+- **EM Loop:**
+  - Batch size: 16
+  - K samples: 8 rationale candidates per problem
+  - Learning rate: 2e-5 (inferred from Trainer defaults)
+  - Epochs per M-step: 1
+- **Reward Function:**
+  ```
+  R(c,x,z) = log p(x|z,c) + log p(z|c)
+  ```
+  Where log probabilities are computed as negative cross-entropy loss.
+#### Speeds, Sizes, Times
+- **Model Size:** ~7B parameters (base) + ~0.02B (LoRA adapters)
+- **Hardware:** H200 GPU (141 GB VRAM)
+- **Training Time:** ~X hours per EM iteration (depending on dataset size)
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+- Seed dataset: 253 triples (training/validation split if applicable)
+- Generated data: Synthetic problems created during EM iterations
+#### Metrics
+- **Reward Score**: Average reward per iteration (R(c,x,z) = log p(x|z,c) + log p(z|c))
+- **Training Loss**: Cross-entropy loss on selected triples
+- **Rationale Quality**: Measured through reward-based selection
+### Results
+Training progress is monitored via WandB:
+- E-step reward statistics (avg, max, min)
+- M-step training losses for both models
+- Number of triples selected per iteration
+**Note:** This is an ongoing training process. Final evaluation results will be updated upon completion of all EM iterations.
+#### Summary
+The model is trained using PromptCoT 2.0's EM algorithm, which iteratively improves both problem generation (pθ) and rationale generation (qφ) capabilities through reward-based selection.
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Technical Specifications
+### Model Architecture and Objective
+- **Base Architecture:** Qwen2.5-7B-Instruct (Transformer decoder)
+- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
+- **Objective:** Causal language modeling with masked cross-entropy
+- **Task:** Generate problems `x` given concepts `c` and rationale `z`
+### Compute Infrastructure
+#### Hardware
+- **Training:** NVIDIA H200 GPU (141 GB VRAM)
+- **Inference:** Compatible with any GPU supporting bfloat16
+#### Software
+- **Framework:** PyTorch 2.0+
+- **Libraries:**
+  - transformers
+  - peft (v0.17.1+)
+  - datasets
+  - wandb (for logging)
+- **CUDA:** Compatible with CUDA 11.8+
+## Citation
+If you use this model, please cite the PromptCoT 2.0 paper:
+**BibTeX:**
+```bibtex
+@article{zhao2025promptcot2,
+  title={PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
+  author={Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
+  journal={arXiv preprint arXiv:2509.19894},
+  year={2025}
+}
+```
+**APA:**
+Zhao, X., Wu, W., Guan, J., Gong, Z., & Kong, L. (2025). PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning. _arXiv preprint arXiv:2509.19894_.
+**Paper Link:** [https://arxiv.org/abs/2509.19894](https://arxiv.org/abs/2509.19894)
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors
+[Your Name/Organization]
+## Model Card Contact
+[Your Email/Contact]
+### Framework versions
+- PEFT 0.17.1
+- transformers 4.40.0+
+- torch 2.0+