Spaces:

model-garden-lms
/

README

Running

stefan-it commited on Dec 20, 2024

Commit

613427f

verified ·

1 Parent(s): b4b321f

readme: emojify more headers 💕

Files changed (1) hide show

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ The following LMs are currently supported:
 * [Token Dropping for efficient BERT Pretraining](https://aclanthology.org/2022.acl-long.262/) - see [pretraining instructions](https://github.com/stefan-it/model-garden-lms/tree/main/token-dropping-bert)
 * [Training ELECTRA Augmented with Multi-word Selection](https://aclanthology.org/2021.findings-acl.219/) (TEAMS) - see [pretraining instructions](https://github.com/stefan-it/model-garden-lms/tree/main/teams)
-# FineWeb-LMs
 Following LMs were pretrained on the (10BT subset) of the famous [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset:
@@ -25,7 +25,7 @@ Following LMs were pretrained on the (10BT subset) of the famous [FineWeb](https
 * Token Dropping BERT-based - find the [best model checkpoint here](https://huggingface.co/model-garden-lms/bert-base-token-dropping-finewebs-901k)
 * TEAMS-based - fine the [best model checkpoint here](https://huggingface.co/model-garden-lms/teams-base-finewebs-1m)
-# ScandEval Evaluation
 To find the best checkpoints and compare our FineWeb-LMs to other models (BERT, ELECTRA and RoBERTa) we perform an evaluation using the great [ScandEval](https://github.com/ScandEval/ScandEval) library.

 * [Token Dropping for efficient BERT Pretraining](https://aclanthology.org/2022.acl-long.262/) - see [pretraining instructions](https://github.com/stefan-it/model-garden-lms/tree/main/token-dropping-bert)
 * [Training ELECTRA Augmented with Multi-word Selection](https://aclanthology.org/2021.findings-acl.219/) (TEAMS) - see [pretraining instructions](https://github.com/stefan-it/model-garden-lms/tree/main/teams)
+# 🍷 FineWeb-LMs
 Following LMs were pretrained on the (10BT subset) of the famous [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset:
 * Token Dropping BERT-based - find the [best model checkpoint here](https://huggingface.co/model-garden-lms/bert-base-token-dropping-finewebs-901k)
 * TEAMS-based - fine the [best model checkpoint here](https://huggingface.co/model-garden-lms/teams-base-finewebs-1m)
+# 📊 ScandEval Evaluation
 To find the best checkpoints and compare our FineWeb-LMs to other models (BERT, ELECTRA and RoBERTa) we perform an evaluation using the great [ScandEval](https://github.com/ScandEval/ScandEval) library.