ZhaoyangWei commited on
Commit
24d19fb
·
verified ·
1 Parent(s): 87e92d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -35,17 +35,22 @@ The core of GRiP lies in its **Policy Refinement** stage, which addresses the "C
35
  $$ R_{\text{total}} = R_{\text{acc}} + R_{\text{fmt}} + R_{\text{sw-IoU}} + R_{\text{MHR}} $$
36
 
37
  Where:
38
- **Salience-Weighted IoU Reward ($R_{\text{sw-IoU}}$):** Incentivizes the model to prioritize mission-critical objects over trivial distractors. It weights the recall component by an object's salience score $$s_k$$:
39
- $$R_{\text{recall}} = \frac{1}{\sum s_k} \sum_{k=1}^{M} s_k \cdot \max_{i} \text{IoU}(p_i, g_k)$$
40
- **Multi-Heuristic Reward ($R_{\text{MHR}}$):** Encourages cognitive flexibility by rewarding diverse valid reasoning pathways (e.g., Bottom-Up, Top-Down, Deductive Verification). The model is rewarded based on similarity to the best-matching reference trajectory:
41
- $$R_{\text{MHR}} = \max_{j \in \{1,2,3\}} \text{sim}(\tau_{\text{gen}}, \tau_{\text{ref}}^j)$$
 
 
 
 
 
 
42
 
43
 
44
- ![Methodology](
45
 
46
  ![image](https://cdn-uploads.huggingface.co/production/uploads/66daf60cbb6e7331f46ea070/uhChByMJIAHaSC6HeeYjy.png)
47
 
48
- )
49
 
50
  ## Performance
51
 
 
35
  $$ R_{\text{total}} = R_{\text{acc}} + R_{\text{fmt}} + R_{\text{sw-IoU}} + R_{\text{MHR}} $$
36
 
37
  Where:
38
+ * **Salience-Weighted IoU Reward ($R_{\text{sw-IoU}}$):** Incentivizes the model to prioritize mission-critical objects over trivial distractors. It weights the recall component by an object's salience score $s_k$:
39
+ $$
40
+ R_{\text{recall}} = \frac{1}{\sum s_k} \sum_{k=1}^{M} s_k \cdot \max_{i} \text{IoU}(p_i, g_k)
41
+ $$
42
+
43
+ * **Multi-Heuristic Reward ($R_{\text{MHR}}$):** Encourages cognitive flexibility by rewarding diverse valid reasoning pathways (e.g., Bottom-Up, Top-Down, Deductive Verification). The model is rewarded based on similarity to the best-matching reference trajectory:
44
+ $$
45
+ R_{\text{MHR}} = \max_{j \in \{1,2,3\}} \text{sim}(\tau_{\text{gen}}, \tau_{\text{ref}}^j)
46
+ $$
47
+
48
 
49
 
 
50
 
51
  ![image](https://cdn-uploads.huggingface.co/production/uploads/66daf60cbb6e7331f46ea070/uhChByMJIAHaSC6HeeYjy.png)
52
 
53
+
54
 
55
  ## Performance
56