Upload README.md
Browse files
README.md
CHANGED
|
@@ -41,6 +41,8 @@ $$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot
|
|
| 41 |
|
| 42 |
with **λ = 0.1**. This soft regularization reduces divergence errors at inference time at zero architectural cost.
|
| 43 |
|
|
|
|
|
|
|
| 44 |
---
|
| 45 |
|
| 46 |
## Training Details
|
|
|
|
| 41 |
|
| 42 |
with **λ = 0.1**. This soft regularization reduces divergence errors at inference time at zero architectural cost.
|
| 43 |
|
| 44 |
+

|
| 45 |
+
|
| 46 |
---
|
| 47 |
|
| 48 |
## Training Details
|