Update README.md
Browse files
README.md
CHANGED
|
@@ -56,7 +56,7 @@ To transfer knowledge from English model to Czech, we developed a simple method
|
|
| 56 |
Figure 4: Ablation: Test perplexity over the course of training for vocabulary swap method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
|
| 57 |
|
| 58 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
| 59 |
-
|
| 60 |
|
| 61 |
## Hyperparameters
|
| 62 |
Not mentioned hyperparameters were kept the same as for MPT.
|
|
|
|
| 56 |
Figure 4: Ablation: Test perplexity over the course of training for vocabulary swap method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
|
| 57 |
|
| 58 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
| 59 |
+
For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
|
| 60 |
|
| 61 |
## Hyperparameters
|
| 62 |
Not mentioned hyperparameters were kept the same as for MPT.
|