NeTS-lab commited on
Commit
eed619a
·
verified ·
1 Parent(s): 1abd991

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -7,11 +7,13 @@ metrics:
7
  pipeline_tag: text-classification
8
  ---
9
 
10
- This model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023).
11
 
12
  It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
13
 
14
  The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
15
- The base model **eMG-RNN-base** auses 650 units for both the embedding and hidden layers (Gulordava et al., 2018). It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus.
 
 
16
 
17
  The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (***MorPiece***) are all available on GitHub at: [cristianochesi/babylm-2024](https://github.com/cristianochesi/babylm-2024)
 
7
  pipeline_tag: text-classification
8
  ---
9
 
10
+ This ***89.6M*** parameters model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023).
11
 
12
  It is named **eMG-RNN** in reference to the closest computational implementation of the core Minimalist Grammar: [expectation-based Minimalist Grammar]( https://github.com/cristianochesi/e-MGs).
13
 
14
  The model implements two pathways, similar to those in an ***LSTM***: one to manage “continuations” (the **Merge** gate) and another for “holding” (the **Move** gate). The specific “forget gating” system, inspired by ***GRUs***, is designed to bias information flow in a way that may mimic C-command.
15
+ The base model **eMG-RNN-base** auses 650 units for both the embedding and hidden layers (Gulordava et al., 2018).
16
+
17
+ It employs a ***BPE*** tokenizer with `min_freq=3`, producing a lexicon of 67,572 tokens using the [BabyLM 2024 10M dataset](https://osf.io/5mk3x) (***Small-strict*** track) as the training corpus.
18
 
19
  The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (***MorPiece***) are all available on GitHub at: [cristianochesi/babylm-2024](https://github.com/cristianochesi/babylm-2024)