Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,9 +14,7 @@ pinned: false
|
|
| 14 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/>
|
| 15 |
|
| 16 |
</div>
|
| 17 |
-
|
| 18 |
-

|
| 19 |
-
-->
|
| 20 |
This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024).
|
| 21 |
If you use our code or models, please cite our ACL paper:
|
| 22 |
|
|
@@ -57,12 +55,15 @@ sentences.
|
|
| 57 |
|
| 58 |
## Models
|
| 59 |
|
| 60 |
-
For each language, we provide
|
| 61 |
-
**GPT-2
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
Models names match the following pattern:
|
| 68 |
|
|
@@ -71,4 +72,14 @@ Models names match the following pattern:
|
|
| 71 |
where `language_name` is the name an impossible language from table above,
|
| 72 |
converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and
|
| 73 |
`model_architecture` is one of `gpt2` (for the standard GPT-2 architecture)
|
| 74 |
-
or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/>
|
| 15 |
|
| 16 |
</div>
|
| 17 |
+
|
|
|
|
|
|
|
| 18 |
This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024).
|
| 19 |
If you use our code or models, please cite our ACL paper:
|
| 20 |
|
|
|
|
| 55 |
|
| 56 |
## Models
|
| 57 |
|
| 58 |
+
For each language, we provide two models:
|
| 59 |
+
1. A [**standard GPT-2 Small model**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-67270160d99170620f5a27f6).
|
| 60 |
+
2. A [**GPT-2 Small model trained without positional encodings**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-no-positional-encodings-6727286b3d1650b1b374fdeb).
|
| 61 |
+
|
| 62 |
+
Each model is trained *from scratch* exclusively on data from
|
| 63 |
+
one impossible language. This makes a total of 30 models:
|
| 64 |
+
15 standard GPT-2 models and 15 GPT-2 models without
|
| 65 |
+
positional encodings. We separate these models out into two
|
| 66 |
+
collections below for ease when navigating models.
|
| 67 |
|
| 68 |
Models names match the following pattern:
|
| 69 |
|
|
|
|
| 72 |
where `language_name` is the name an impossible language from table above,
|
| 73 |
converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and
|
| 74 |
`model_architecture` is one of `gpt2` (for the standard GPT-2 architecture)
|
| 75 |
+
or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings).
|
| 76 |
+
|
| 77 |
+
### Model Checkpoints
|
| 78 |
+
|
| 79 |
+
On the main revision of each model, we provide the final
|
| 80 |
+
model artefact we trained (checkpoint 3000). We also provide
|
| 81 |
+
29 intermediate checkpoints over the course of training,
|
| 82 |
+
from checkpoint 100 to 3000 in increments of 100 steps.
|
| 83 |
+
These checkpoints can help you replicate the experiments
|
| 84 |
+
we show in the paper and are provided in each model repo as
|
| 85 |
+
separate revisions.
|