bean4259
/

babylm-roformer

Model card Files Files and versions

babylm-roformer / README.md

bean4259's picture

Upload README.md with huggingface_hub

7aa57ae verified about 2 months ago

|

history blame contribute delete

986 Bytes

	---
	license: mit
	language:
	- en
	tags:
	- fill-mask
	- roformer
	- babylm
	pipeline_tag: fill-mask
	---

	# BabyLM RoFormer (10M tokens)

	A BERT-style masked language model trained from scratch on the BabyLM 10M dataset.

	## Model Details

	- Architecture: RoFormer (BERT + Rotary Position Embeddings)
	- Parameters: ~10M
	- Training Data: BabyLM Strict-Small (10M tokens)
	- Vocabulary: 16,384 tokens (WordPiece)
	- Context Length: 128 tokens

	## Usage

	```python
	from transformers import RoFormerForMaskedLM, RoFormerTokenizer

	model = RoFormerForMaskedLM.from_pretrained("bean4259/babylm-roformer")
	tokenizer = RoFormerTokenizer.from_pretrained("bean4259/babylm-roformer")

	# Fill-mask example
	text = "The cat sat on the [MASK]."
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	```

	## Training

	Trained using a custom training loop with:
	- Sequence packing (8.26x compression)
	- AdamW optimizer (lr=1e-4)
	- Linear warmup + decay schedule
	- 10 epochs