JingHaoZ commited on
Commit
2d87914
·
verified ·
1 Parent(s): 3f0f173

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -22,7 +22,7 @@ base_model:
22
  - Metrics of online rejection-sampling are flexible to direct the constitution of reference flow for reward calculation.
23
  - 📈 **Reward Behavior** Flow rewards enable arbitrary expert off-policy data as reference for constituting reward signal. Additionally, flow rewards rely on efficient context dependence that natively compressed in the latent space rather than individual denotation in the token space for context comprehending.
24
 
25
- ![Language](https://cdn-uploads.huggingface.co/production/uploads/64673258fc6f6da8b119cab8/tGPvzsjIb_vsGqesMFwHB.png
26
 
27
  ### Model Description
28
  - Trained from model:[Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
 
22
  - Metrics of online rejection-sampling are flexible to direct the constitution of reference flow for reward calculation.
23
  - 📈 **Reward Behavior** Flow rewards enable arbitrary expert off-policy data as reference for constituting reward signal. Additionally, flow rewards rely on efficient context dependence that natively compressed in the latent space rather than individual denotation in the token space for context comprehending.
24
 
25
+ ![Language](https://cdn-uploads.huggingface.co/production/uploads/64673258fc6f6da8b119cab8/tGPvzsjIb_vsGqesMFwHB.png)
26
 
27
  ### Model Description
28
  - Trained from model:[Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)