dwenlong commited on
Commit
aef9fb2
·
verified ·
1 Parent(s): 29300e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -29,6 +29,7 @@ It prevents training collapse by regularizing **only when** the likelihood of (g
29
  - We identify **Lazy Likelihood Displacement (LLD)** as a key mechanism behind collapse in tool-integrated GRPO training.
30
  - LLDS activates **selectively**: it penalizes likelihood reduction on a *preserving set* (e.g., non-negative-advantage actions).
31
  - We release our **LLDS-tuned Qwen2.5-3B-Base** checkpoint for searchs-integrated reasoning and QA.
 
32
 
33
 
34
  ## 🔍 Tool-Integrated Search Inference (Search-R1 style)
 
29
  - We identify **Lazy Likelihood Displacement (LLD)** as a key mechanism behind collapse in tool-integrated GRPO training.
30
  - LLDS activates **selectively**: it penalizes likelihood reduction on a *preserving set* (e.g., non-negative-advantage actions).
31
  - We release our **LLDS-tuned Qwen2.5-3B-Base** checkpoint for searchs-integrated reasoning and QA.
32
+ - **A refer to action-level gate**, R refer to response-level gate, **action (A) level gate achieve the best performance**.
33
 
34
 
35
  ## 🔍 Tool-Integrated Search Inference (Search-R1 style)