JingHaoZ
/

RLFR-Qwen2.5-Math-7B

Text Generation

text-generation-inference

Model card Files Files and versions

JingHaoZ commited on Oct 10, 2025

Commit

2d87914

·

verified ·

1 Parent(s): 3f0f173

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ base_model:
   - Metrics of online rejection-sampling are flexible to direct the constitution of reference flow for reward calculation.
 - 📈 **Reward Behavior** Flow rewards enable arbitrary expert off-policy data as reference for constituting reward signal. Additionally, flow rewards rely on efficient context dependence that natively compressed in the latent space rather than individual denotation in the token space for context comprehending.
-![Language](https://cdn-uploads.huggingface.co/production/uploads/64673258fc6f6da8b119cab8/tGPvzsjIb_vsGqesMFwHB.png
 ### Model Description
 - Trained from model：[Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)

   - Metrics of online rejection-sampling are flexible to direct the constitution of reference flow for reward calculation.
 - 📈 **Reward Behavior** Flow rewards enable arbitrary expert off-policy data as reference for constituting reward signal. Additionally, flow rewards rely on efficient context dependence that natively compressed in the latent space rather than individual denotation in the token space for context comprehending.
+![Language](https://cdn-uploads.huggingface.co/production/uploads/64673258fc6f6da8b119cab8/tGPvzsjIb_vsGqesMFwHB.png)
 ### Model Description
 - Trained from model：[Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)