bean4259 commited on
Commit
7aa57ae
·
verified ·
1 Parent(s): 06188dd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - fill-mask
7
+ - roformer
8
+ - babylm
9
+ pipeline_tag: fill-mask
10
+ ---
11
+
12
+ # BabyLM RoFormer (10M tokens)
13
+
14
+ A BERT-style masked language model trained from scratch on the BabyLM 10M dataset.
15
+
16
+ ## Model Details
17
+
18
+ - **Architecture**: RoFormer (BERT + Rotary Position Embeddings)
19
+ - **Parameters**: ~10M
20
+ - **Training Data**: BabyLM Strict-Small (10M tokens)
21
+ - **Vocabulary**: 16,384 tokens (WordPiece)
22
+ - **Context Length**: 128 tokens
23
+
24
+ ## Usage
25
+
26
+ ```python
27
+ from transformers import RoFormerForMaskedLM, RoFormerTokenizer
28
+
29
+ model = RoFormerForMaskedLM.from_pretrained("bean4259/babylm-roformer")
30
+ tokenizer = RoFormerTokenizer.from_pretrained("bean4259/babylm-roformer")
31
+
32
+ # Fill-mask example
33
+ text = "The cat sat on the [MASK]."
34
+ inputs = tokenizer(text, return_tensors="pt")
35
+ outputs = model(**inputs)
36
+ ```
37
+
38
+ ## Training
39
+
40
+ Trained using a custom training loop with:
41
+ - Sequence packing (8.26x compression)
42
+ - AdamW optimizer (lr=1e-4)
43
+ - Linear warmup + decay schedule
44
+ - 10 epochs