Spaces:

mission-impossible-lms
/

README

Running

App Files Files Community

juliekallini commited on Nov 4, 2024

Commit

61983a8

verified ·

1 Parent(s): dc712b8

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -10

README.md CHANGED Viewed

@@ -14,9 +14,7 @@ pinned: false
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/>
 </div>
-<!--
-![language-continuum.drawio.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png)
- -->
 This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024).
 If you use our code or models, please cite our ACL paper:
@@ -57,12 +55,15 @@ sentences.
 ## Models
-For each language, we provide one **standard GPT-2 model** as well as one
-**GPT-2 model trained without positional encodings**. Each model is trained
-*from scratch* exclusively on data from one impossible language. This makes
-a total of 30 models: 15 standard GPT-2 models and 15 GPT-2 models without
-positional encodings. We separate these models out into two collections
-below for ease when navigating models.
 Models names match the following pattern:
@@ -71,4 +72,14 @@ Models names match the following pattern:
 where `language_name` is the name an impossible language from table above,
 converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and
 `model_architecture` is one of `gpt2` (for the standard GPT-2 architecture)
-or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings).

 <img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/>
 </div>
 This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024).
 If you use our code or models, please cite our ACL paper:
 ## Models
+For each language, we provide two models:
+1. A [**standard GPT-2 Small model**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-67270160d99170620f5a27f6).
+2. A [**GPT-2 Small model trained without positional encodings**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-no-positional-encodings-6727286b3d1650b1b374fdeb).
+Each model is trained *from scratch* exclusively on data from
+one impossible language. This makes a total of 30 models:
+15 standard GPT-2 models and 15 GPT-2 models without
+positional encodings. We separate these models out into two
+collections below for ease when navigating models.
 Models names match the following pattern:
 where `language_name` is the name an impossible language from table above,
 converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and
 `model_architecture` is one of `gpt2` (for the standard GPT-2 architecture)
+or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings).
+### Model Checkpoints
+On the main revision of each model, we provide the final
+model artefact we trained (checkpoint 3000). We also provide
+29 intermediate checkpoints over the course of training,
+from checkpoint 100 to 3000 in increments of 100 steps.
+These checkpoints can help you replicate the experiments
+we show in the paper and are provided in each model repo as
+separate revisions.