Update README.md
Browse files
README.md
CHANGED
|
@@ -29,18 +29,18 @@ The models are trained iteratively using an EM loop:
|
|
| 29 |
1. **E-step**: Generate K=8 rationale candidates, compute rewards, select best
|
| 30 |
2. **M-step**: Fine-tune both models on selected (concept, rationale, problem) triples
|
| 31 |
|
| 32 |
-
- **Developed by:**
|
| 33 |
- **Model type:** LoRA fine-tuned Causal Language Model
|
| 34 |
- **Language(s):** English (mathematical reasoning)
|
| 35 |
-
- **License:** Apache 2.0 (inherited from Qwen2.5-7B
|
| 36 |
-
- **Finetuned from:** Qwen/Qwen2.5-7B
|
| 37 |
|
| 38 |
### Model Sources
|
| 39 |
|
| 40 |
- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
|
| 41 |
- **Paper:** [PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning](https://arxiv.org/abs/2509.19894) (arXiv:2509.19894)
|
| 42 |
- **Authors:** Xueliang Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong
|
| 43 |
-
- **Related Model:** [
|
| 44 |
|
| 45 |
## Uses
|
| 46 |
|
|
@@ -95,12 +95,6 @@ This model is specialized for mathematical reasoning and may not perform well fo
|
|
| 95 |
- **EM Convergence**: The EM algorithm may converge to local optima, depending on initialization and hyperparameters
|
| 96 |
- **Generated Quality**: Generated problems may require manual validation for correctness and appropriateness
|
| 97 |
|
| 98 |
-
### Technical Limitations
|
| 99 |
-
|
| 100 |
-
- **Context Length**: Limited to 512 tokens during EM training (2048 for cold start)
|
| 101 |
-
- **Sampling**: Uses temperature sampling (T=0.7) which may produce diverse but sometimes inconsistent outputs
|
| 102 |
-
- **Reward Function**: The reward is based on log probabilities, which may not perfectly correlate with problem quality
|
| 103 |
-
|
| 104 |
### Recommendations
|
| 105 |
|
| 106 |
Users should:
|
|
@@ -292,24 +286,6 @@ Zhao, X., Wu, W., Guan, J., Gong, Z., & Kong, L. (2025). PromptCoT 2.0: Scaling
|
|
| 292 |
|
| 293 |
**Paper Link:** [https://arxiv.org/abs/2509.19894](https://arxiv.org/abs/2509.19894)
|
| 294 |
|
| 295 |
-
## Glossary [optional]
|
| 296 |
-
|
| 297 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 298 |
-
|
| 299 |
-
[More Information Needed]
|
| 300 |
-
|
| 301 |
-
## More Information [optional]
|
| 302 |
-
|
| 303 |
-
[More Information Needed]
|
| 304 |
-
|
| 305 |
-
## Model Card Authors
|
| 306 |
-
|
| 307 |
-
[Your Name/Organization]
|
| 308 |
-
|
| 309 |
-
## Model Card Contact
|
| 310 |
-
|
| 311 |
-
[Your Email/Contact]
|
| 312 |
-
|
| 313 |
### Framework versions
|
| 314 |
|
| 315 |
- PEFT 0.17.1
|
|
|
|
| 29 |
1. **E-step**: Generate K=8 rationale candidates, compute rewards, select best
|
| 30 |
2. **M-step**: Fine-tune both models on selected (concept, rationale, problem) triples
|
| 31 |
|
| 32 |
+
- **Developed by:** Krzysztof Staroń
|
| 33 |
- **Model type:** LoRA fine-tuned Causal Language Model
|
| 34 |
- **Language(s):** English (mathematical reasoning)
|
| 35 |
+
- **License:** Apache 2.0 (inherited from Qwen2.5-7B)
|
| 36 |
+
- **Finetuned from:** Qwen/Qwen2.5-7B
|
| 37 |
|
| 38 |
### Model Sources
|
| 39 |
|
| 40 |
- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
|
| 41 |
- **Paper:** [PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning](https://arxiv.org/abs/2509.19894) (arXiv:2509.19894)
|
| 42 |
- **Authors:** Xueliang Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong
|
| 43 |
+
- **Related Model:** [PromptCoT2.0](https://huggingface.co/xl-zhao/PromptCoT-2.0-Prompt-Generation-Model)
|
| 44 |
|
| 45 |
## Uses
|
| 46 |
|
|
|
|
| 95 |
- **EM Convergence**: The EM algorithm may converge to local optima, depending on initialization and hyperparameters
|
| 96 |
- **Generated Quality**: Generated problems may require manual validation for correctness and appropriateness
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
### Recommendations
|
| 99 |
|
| 100 |
Users should:
|
|
|
|
| 286 |
|
| 287 |
**Paper Link:** [https://arxiv.org/abs/2509.19894](https://arxiv.org/abs/2509.19894)
|
| 288 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 289 |
### Framework versions
|
| 290 |
|
| 291 |
- PEFT 0.17.1
|