Update README.md
Browse files
README.md
CHANGED
|
@@ -35,17 +35,22 @@ The core of GRiP lies in its **Policy Refinement** stage, which addresses the "C
|
|
| 35 |
$$ R_{\text{total}} = R_{\text{acc}} + R_{\text{fmt}} + R_{\text{sw-IoU}} + R_{\text{MHR}} $$
|
| 36 |
|
| 37 |
Where:
|
| 38 |
-
**Salience-Weighted IoU Reward ($R_{\text{sw-IoU}}$):** Incentivizes the model to prioritize mission-critical objects over trivial distractors. It weights the recall component by an object's salience score $
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
|
| 44 |
-

|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
## Performance
|
| 51 |
|
|
|
|
| 35 |
$$ R_{\text{total}} = R_{\text{acc}} + R_{\text{fmt}} + R_{\text{sw-IoU}} + R_{\text{MHR}} $$
|
| 36 |
|
| 37 |
Where:
|
| 38 |
+
* **Salience-Weighted IoU Reward ($R_{\text{sw-IoU}}$):** Incentivizes the model to prioritize mission-critical objects over trivial distractors. It weights the recall component by an object's salience score $s_k$:
|
| 39 |
+
$$
|
| 40 |
+
R_{\text{recall}} = \frac{1}{\sum s_k} \sum_{k=1}^{M} s_k \cdot \max_{i} \text{IoU}(p_i, g_k)
|
| 41 |
+
$$
|
| 42 |
+
|
| 43 |
+
* **Multi-Heuristic Reward ($R_{\text{MHR}}$):** Encourages cognitive flexibility by rewarding diverse valid reasoning pathways (e.g., Bottom-Up, Top-Down, Deductive Verification). The model is rewarded based on similarity to the best-matching reference trajectory:
|
| 44 |
+
$$
|
| 45 |
+
R_{\text{MHR}} = \max_{j \in \{1,2,3\}} \text{sim}(\tau_{\text{gen}}, \tau_{\text{ref}}^j)
|
| 46 |
+
$$
|
| 47 |
+
|
| 48 |
|
| 49 |
|
|
|
|
| 50 |
|
| 51 |

|
| 52 |
|
| 53 |
+
|
| 54 |
|
| 55 |
## Performance
|
| 56 |
|