--- license: cc-by-sa-4.0 language: - en metrics: - accuracy pipeline_tag: text-generation --- This ***89.6M*** parameters model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023). It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs). The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command. The base model **eMG-RNN-base** uses 650 units for both the embedding and hidden layer (Gulordava et al., 2018). Only one hidden layer is adopted in this base model to appreciate the effect of individual gating systems. It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus. The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (***MorPiece***) are all available on GitHub at: [cristianochesi/babylm-2024](https://github.com/cristianochesi/babylm-2024)