juliekallini commited on
Commit
61983a8
·
verified ·
1 Parent(s): dc712b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -10
README.md CHANGED
@@ -14,9 +14,7 @@ pinned: false
14
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/>
15
 
16
  </div>
17
- <!--
18
- ![language-continuum.drawio.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png)
19
- -->
20
  This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024).
21
  If you use our code or models, please cite our ACL paper:
22
 
@@ -57,12 +55,15 @@ sentences.
57
 
58
  ## Models
59
 
60
- For each language, we provide one **standard GPT-2 model** as well as one
61
- **GPT-2 model trained without positional encodings**. Each model is trained
62
- *from scratch* exclusively on data from one impossible language. This makes
63
- a total of 30 models: 15 standard GPT-2 models and 15 GPT-2 models without
64
- positional encodings. We separate these models out into two collections
65
- below for ease when navigating models.
 
 
 
66
 
67
  Models names match the following pattern:
68
 
@@ -71,4 +72,14 @@ Models names match the following pattern:
71
  where `language_name` is the name an impossible language from table above,
72
  converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and
73
  `model_architecture` is one of `gpt2` (for the standard GPT-2 architecture)
74
- or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings).
 
 
 
 
 
 
 
 
 
 
 
14
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/>
15
 
16
  </div>
17
+
 
 
18
  This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024).
19
  If you use our code or models, please cite our ACL paper:
20
 
 
55
 
56
  ## Models
57
 
58
+ For each language, we provide two models:
59
+ 1. A [**standard GPT-2 Small model**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-67270160d99170620f5a27f6).
60
+ 2. A [**GPT-2 Small model trained without positional encodings**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-no-positional-encodings-6727286b3d1650b1b374fdeb).
61
+
62
+ Each model is trained *from scratch* exclusively on data from
63
+ one impossible language. This makes a total of 30 models:
64
+ 15 standard GPT-2 models and 15 GPT-2 models without
65
+ positional encodings. We separate these models out into two
66
+ collections below for ease when navigating models.
67
 
68
  Models names match the following pattern:
69
 
 
72
  where `language_name` is the name an impossible language from table above,
73
  converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and
74
  `model_architecture` is one of `gpt2` (for the standard GPT-2 architecture)
75
+ or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings).
76
+
77
+ ### Model Checkpoints
78
+
79
+ On the main revision of each model, we provide the final
80
+ model artefact we trained (checkpoint 3000). We also provide
81
+ 29 intermediate checkpoints over the course of training,
82
+ from checkpoint 100 to 3000 in increments of 100 steps.
83
+ These checkpoints can help you replicate the experiments
84
+ we show in the paper and are provided in each model repo as
85
+ separate revisions.