File size: 986 Bytes
7aa57ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: mit
language:
- en
tags:
- fill-mask
- roformer
- babylm
pipeline_tag: fill-mask
---

# BabyLM RoFormer (10M tokens)

A BERT-style masked language model trained from scratch on the BabyLM 10M dataset.

## Model Details

- **Architecture**: RoFormer (BERT + Rotary Position Embeddings)
- **Parameters**: ~10M
- **Training Data**: BabyLM Strict-Small (10M tokens)
- **Vocabulary**: 16,384 tokens (WordPiece)
- **Context Length**: 128 tokens

## Usage

```python
from transformers import RoFormerForMaskedLM, RoFormerTokenizer

model = RoFormerForMaskedLM.from_pretrained("bean4259/babylm-roformer")
tokenizer = RoFormerTokenizer.from_pretrained("bean4259/babylm-roformer")

# Fill-mask example
text = "The cat sat on the [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
```

## Training

Trained using a custom training loop with:
- Sequence packing (8.26x compression)
- AdamW optimizer (lr=1e-4)
- Linear warmup + decay schedule
- 10 epochs