Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,13 @@ metrics:
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
---
|
| 9 |
|
| 10 |
-
This model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023).
|
| 11 |
|
| 12 |
It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
|
| 13 |
|
| 14 |
The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
|
| 15 |
-
The base model **eMG-RNN-base** auses 650 units for both the embedding and hidden layers (Gulordava et al., 2018).
|
|
|
|
|
|
|
| 16 |
|
| 17 |
The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (***MorPiece***) are all available on GitHub at: [cristianochesi/babylm-2024](https://github.com/cristianochesi/babylm-2024)
|
|
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
---
|
| 9 |
|
| 10 |
+
This ***89.6M*** parameters model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023).
|
| 11 |
|
| 12 |
It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
|
| 13 |
|
| 14 |
The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
|
| 15 |
+
The base model **eMG-RNN-base** auses 650 units for both the embedding and hidden layers (Gulordava et al., 2018).
|
| 16 |
+
|
| 17 |
+
It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus.
|
| 18 |
|
| 19 |
The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (***MorPiece***) are all available on GitHub at: [cristianochesi/babylm-2024](https://github.com/cristianochesi/babylm-2024)
|