Minor changes
Browse files
README.md
CHANGED
|
@@ -38,5 +38,5 @@ With MosaicBERT and FlashAttention 2, we can increase the throughput from 190,00
|
|
| 38 |
## Model variations
|
| 39 |
For the creation of BERTchen we tested different datasets and training setups. Two notable variants are:
|
| 40 |
|
| 41 |
-
- [`BERTchen-v0.1-C4`](https://huggingface.co/frederic-sadrieh/BERTchen-v0.1-C4) Same
|
| 42 |
-
- [`hybrid_BERTchen-v0.1`](https://huggingface.co/frederic-sadrieh/hybrid_BERTchen-v0.1)
|
|
|
|
| 38 |
## Model variations
|
| 39 |
For the creation of BERTchen we tested different datasets and training setups. Two notable variants are:
|
| 40 |
|
| 41 |
+
- [`BERTchen-v0.1-C4`](https://huggingface.co/frederic-sadrieh/BERTchen-v0.1-C4) Same pretraining setup and hyperparameters just on the [C4](https://huggingface.co/datasets/allenai/c4) dataset.
|
| 42 |
+
- [`hybrid_BERTchen-v0.1`](https://huggingface.co/frederic-sadrieh/hybrid_BERTchen-v0.1) Pretrained on [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) with own hybrid sequence length changing approach (For more information see model card or paper)
|