babylm-roformer / README.md
bean4259's picture
Upload README.md with huggingface_hub
7aa57ae verified
metadata
license: mit
language:
  - en
tags:
  - fill-mask
  - roformer
  - babylm
pipeline_tag: fill-mask

BabyLM RoFormer (10M tokens)

A BERT-style masked language model trained from scratch on the BabyLM 10M dataset.

Model Details

  • Architecture: RoFormer (BERT + Rotary Position Embeddings)
  • Parameters: ~10M
  • Training Data: BabyLM Strict-Small (10M tokens)
  • Vocabulary: 16,384 tokens (WordPiece)
  • Context Length: 128 tokens

Usage

from transformers import RoFormerForMaskedLM, RoFormerTokenizer

model = RoFormerForMaskedLM.from_pretrained("bean4259/babylm-roformer")
tokenizer = RoFormerTokenizer.from_pretrained("bean4259/babylm-roformer")

# Fill-mask example
text = "The cat sat on the [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

Training

Trained using a custom training loop with:

  • Sequence packing (8.26x compression)
  • AdamW optimizer (lr=1e-4)
  • Linear warmup + decay schedule
  • 10 epochs