Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ This ***89.6M*** parameters model is based on a custom RNN architecture loosely
|
|
| 12 |
It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
|
| 13 |
|
| 14 |
The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
|
| 15 |
-
The base model **eMG-RNN-base**
|
| 16 |
|
| 17 |
It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus.
|
| 18 |
|
|
|
|
| 12 |
It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
|
| 13 |
|
| 14 |
The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
|
| 15 |
+
The base model **eMG-RNN-base** uses 650 units for both the embedding and hidden layer (Gulordava et al., 2018). Only one hidden layer is adopted in this base model to appreciate the effect of individual gating systems.
|
| 16 |
|
| 17 |
It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus.
|
| 18 |
|