NeTS-lab
/

eMG_RNN_base

Text Generation

Model card Files Files and versions

NeTS-lab commited on Sep 17, 2024

Commit

eed619a

·

verified ·

1 Parent(s): 1abd991

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -7,11 +7,13 @@ metrics:
 pipeline_tag: text-classification
 ---
-This model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023).
 It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
 The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
-The base model **eMG-RNN-base** auses 650 units for both the embedding and hidden layers (Gulordava et al., 2018). It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus.
 The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (***MorPiece***) are all available on GitHub at: [cristianochesi/babylm-2024](https://github.com/cristianochesi/babylm-2024)

 pipeline_tag: text-classification
 ---
+This ***89.6M*** parameters model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023).
 It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
 The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
+The base model **eMG-RNN-base** auses 650 units for both the embedding and hidden layers (Gulordava et al., 2018).
+It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus.
 The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (***MorPiece***) are all available on GitHub at: [cristianochesi/babylm-2024](https://github.com/cristianochesi/babylm-2024)