frederic-sadrieh
/

BERTchen-v0.1

feature-extraction

Model card Files Files and versions

frederic-sadrieh commited on Jul 31, 2024

Commit

fd13d02

·

verified ·

1 Parent(s): 70944b1

Minor changes

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -38,5 +38,5 @@ With MosaicBERT and FlashAttention 2, we can increase the throughput from 190,00
 ## Model variations
 For the creation of BERTchen we tested different datasets and training setups. Two notable variants are:
-- [`BERTchen-v0.1-C4`](https://huggingface.co/frederic-sadrieh/BERTchen-v0.1-C4) Same pre-training just on the [C4](https://huggingface.co/datasets/allenai/c4) dataset.
-- [`hybrid_BERTchen-v0.1`](https://huggingface.co/frederic-sadrieh/hybrid_BERTchen-v0.1) Pre-trained on [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) with own hybrid sequence length changing approach (For more information see model card or paper)

 ## Model variations
 For the creation of BERTchen we tested different datasets and training setups. Two notable variants are:
+- [`BERTchen-v0.1-C4`](https://huggingface.co/frederic-sadrieh/BERTchen-v0.1-C4) Same pretraining setup and hyperparameters just on the [C4](https://huggingface.co/datasets/allenai/c4) dataset.
+- [`hybrid_BERTchen-v0.1`](https://huggingface.co/frederic-sadrieh/hybrid_BERTchen-v0.1) Pretrained on [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) with own hybrid sequence length changing approach (For more information see model card or paper)