--- license: mit language: - en tags: - fill-mask - roformer - babylm pipeline_tag: fill-mask --- # BabyLM RoFormer (10M tokens) A BERT-style masked language model trained from scratch on the BabyLM 10M dataset. ## Model Details - **Architecture**: RoFormer (BERT + Rotary Position Embeddings) - **Parameters**: ~10M - **Training Data**: BabyLM Strict-Small (10M tokens) - **Vocabulary**: 16,384 tokens (WordPiece) - **Context Length**: 128 tokens ## Usage ```python from transformers import RoFormerForMaskedLM, RoFormerTokenizer model = RoFormerForMaskedLM.from_pretrained("bean4259/babylm-roformer") tokenizer = RoFormerTokenizer.from_pretrained("bean4259/babylm-roformer") # Fill-mask example text = "The cat sat on the [MASK]." inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) ``` ## Training Trained using a custom training loop with: - Sequence packing (8.26x compression) - AdamW optimizer (lr=1e-4) - Linear warmup + decay schedule - 10 epochs