Update README.md
Browse files
README.md
CHANGED
|
@@ -52,8 +52,8 @@ $$
|
|
| 52 |
|
| 53 |
Where:
|
| 54 |
- $\theta_s$ and $\theta_t$ represent student (trainable) and teacher (frozen) model parameters
|
| 55 |
-
-
|
| 56 |
-
-
|
| 57 |
- $\alpha_i = \exp(-\delta \cdot i/T)$ implements exponential decay for later reasoning steps
|
| 58 |
- $\mathcal{L}_{\text{QS}}$ is the quality scoring loss ensuring reasoning coherence
|
| 59 |
|
|
|
|
| 52 |
|
| 53 |
Where:
|
| 54 |
- $\theta_s$ and $\theta_t$ represent student (trainable) and teacher (frozen) model parameters
|
| 55 |
+
- $P_{\theta}^{(i)}$ denotes the probability distribution at reasoning step $i$
|
| 56 |
+
- $\lambda(t) = \lambda_0 \cdot (1 + \gamma \cdot \text{complexity}(x_t))$ is the dynamic weight function
|
| 57 |
- $\alpha_i = \exp(-\delta \cdot i/T)$ implements exponential decay for later reasoning steps
|
| 58 |
- $\mathcal{L}_{\text{QS}}$ is the quality scoring loss ensuring reasoning coherence
|
| 59 |
|