PanzerBread commited on
Commit
8278bb4
·
verified ·
1 Parent(s): 4cef487

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -28
README.md CHANGED
@@ -29,18 +29,18 @@ The models are trained iteratively using an EM loop:
29
  1. **E-step**: Generate K=8 rationale candidates, compute rewards, select best
30
  2. **M-step**: Fine-tune both models on selected (concept, rationale, problem) triples
31
 
32
- - **Developed by:** [Your Name/Organization]
33
  - **Model type:** LoRA fine-tuned Causal Language Model
34
  - **Language(s):** English (mathematical reasoning)
35
- - **License:** Apache 2.0 (inherited from Qwen2.5-7B-Instruct)
36
- - **Finetuned from:** Qwen/Qwen2.5-7B-Instruct
37
 
38
  ### Model Sources
39
 
40
  - **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
41
  - **Paper:** [PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning](https://arxiv.org/abs/2509.19894) (arXiv:2509.19894)
42
  - **Authors:** Xueliang Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong
43
- - **Related Model:** [PromptCoT Rationale Model (qφ)](https://huggingface.co/PanzerBread/promptcot-q)
44
 
45
  ## Uses
46
 
@@ -95,12 +95,6 @@ This model is specialized for mathematical reasoning and may not perform well fo
95
  - **EM Convergence**: The EM algorithm may converge to local optima, depending on initialization and hyperparameters
96
  - **Generated Quality**: Generated problems may require manual validation for correctness and appropriateness
97
 
98
- ### Technical Limitations
99
-
100
- - **Context Length**: Limited to 512 tokens during EM training (2048 for cold start)
101
- - **Sampling**: Uses temperature sampling (T=0.7) which may produce diverse but sometimes inconsistent outputs
102
- - **Reward Function**: The reward is based on log probabilities, which may not perfectly correlate with problem quality
103
-
104
  ### Recommendations
105
 
106
  Users should:
@@ -292,24 +286,6 @@ Zhao, X., Wu, W., Guan, J., Gong, Z., & Kong, L. (2025). PromptCoT 2.0: Scaling
292
 
293
  **Paper Link:** [https://arxiv.org/abs/2509.19894](https://arxiv.org/abs/2509.19894)
294
 
295
- ## Glossary [optional]
296
-
297
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
298
-
299
- [More Information Needed]
300
-
301
- ## More Information [optional]
302
-
303
- [More Information Needed]
304
-
305
- ## Model Card Authors
306
-
307
- [Your Name/Organization]
308
-
309
- ## Model Card Contact
310
-
311
- [Your Email/Contact]
312
-
313
  ### Framework versions
314
 
315
  - PEFT 0.17.1
 
29
  1. **E-step**: Generate K=8 rationale candidates, compute rewards, select best
30
  2. **M-step**: Fine-tune both models on selected (concept, rationale, problem) triples
31
 
32
+ - **Developed by:** Krzysztof Staroń
33
  - **Model type:** LoRA fine-tuned Causal Language Model
34
  - **Language(s):** English (mathematical reasoning)
35
+ - **License:** Apache 2.0 (inherited from Qwen2.5-7B)
36
+ - **Finetuned from:** Qwen/Qwen2.5-7B
37
 
38
  ### Model Sources
39
 
40
  - **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
41
  - **Paper:** [PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning](https://arxiv.org/abs/2509.19894) (arXiv:2509.19894)
42
  - **Authors:** Xueliang Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong
43
+ - **Related Model:** [PromptCoT2.0](https://huggingface.co/xl-zhao/PromptCoT-2.0-Prompt-Generation-Model)
44
 
45
  ## Uses
46
 
 
95
  - **EM Convergence**: The EM algorithm may converge to local optima, depending on initialization and hyperparameters
96
  - **Generated Quality**: Generated problems may require manual validation for correctness and appropriateness
97
 
 
 
 
 
 
 
98
  ### Recommendations
99
 
100
  Users should:
 
286
 
287
  **Paper Link:** [https://arxiv.org/abs/2509.19894](https://arxiv.org/abs/2509.19894)
288
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
289
  ### Framework versions
290
 
291
  - PEFT 0.17.1