bean4259
/

babylm-roformer

+---
+license: mit
+language:
+- en
+tags:
+- fill-mask
+- roformer
+- babylm
+pipeline_tag: fill-mask
+---
+# BabyLM RoFormer (10M tokens)
+A BERT-style masked language model trained from scratch on the BabyLM 10M dataset.
+## Model Details
+- **Architecture**: RoFormer (BERT + Rotary Position Embeddings)
+- **Parameters**: ~10M
+- **Training Data**: BabyLM Strict-Small (10M tokens)
+- **Vocabulary**: 16,384 tokens (WordPiece)
+- **Context Length**: 128 tokens
+## Usage
+```python
+from transformers import RoFormerForMaskedLM, RoFormerTokenizer
+model = RoFormerForMaskedLM.from_pretrained("bean4259/babylm-roformer")
+tokenizer = RoFormerTokenizer.from_pretrained("bean4259/babylm-roformer")
+# Fill-mask example
+text = "The cat sat on the [MASK]."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+```
+## Training
+Trained using a custom training loop with:
+- Sequence packing (8.26x compression)
+- AdamW optimizer (lr=1e-4)
+- Linear warmup + decay schedule
+- 10 epochs