| license: mit | |
| language: | |
| - en | |
| tags: | |
| - fill-mask | |
| - roformer | |
| - babylm | |
| pipeline_tag: fill-mask | |
| # BabyLM RoFormer (10M tokens) | |
| A BERT-style masked language model trained from scratch on the BabyLM 10M dataset. | |
| ## Model Details | |
| - **Architecture**: RoFormer (BERT + Rotary Position Embeddings) | |
| - **Parameters**: ~10M | |
| - **Training Data**: BabyLM Strict-Small (10M tokens) | |
| - **Vocabulary**: 16,384 tokens (WordPiece) | |
| - **Context Length**: 128 tokens | |
| ## Usage | |
| ```python | |
| from transformers import RoFormerForMaskedLM, RoFormerTokenizer | |
| model = RoFormerForMaskedLM.from_pretrained("bean4259/babylm-roformer") | |
| tokenizer = RoFormerTokenizer.from_pretrained("bean4259/babylm-roformer") | |
| # Fill-mask example | |
| text = "The cat sat on the [MASK]." | |
| inputs = tokenizer(text, return_tensors="pt") | |
| outputs = model(**inputs) | |
| ``` | |
| ## Training | |
| Trained using a custom training loop with: | |
| - Sequence packing (8.26x compression) | |
| - AdamW optimizer (lr=1e-4) | |
| - Linear warmup + decay schedule | |
| - 10 epochs | |