Update README.md
Browse files
README.md
CHANGED
|
@@ -28,11 +28,11 @@ During early experiments, we observed that Whisper Tiny often produced invalid o
|
|
| 28 |
|
| 29 |
The training loss combined the standard ASR loss with KD loss:
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
|
| 35 |
-
where
|
| 36 |
|
| 37 |
### Hyperparameters
|
| 38 |
|
|
|
|
| 28 |
|
| 29 |
The training loss combined the standard ASR loss with KD loss:
|
| 30 |
|
| 31 |
+
$$
|
| 32 |
+
L_t = \lambda_{lm} \, \text{CE}(\text{asr}, \text{true token}) + (1 - \lambda_{lm}) \, \text{KLD}(\text{asr distribution}, \text{mlm prediction})
|
| 33 |
+
$$
|
| 34 |
|
| 35 |
+
where $\lambda_{lm}$ balances the two components.
|
| 36 |
|
| 37 |
### Hyperparameters
|
| 38 |
|