bakirgrbic commited on
Commit
e9b4790
·
verified ·
1 Parent(s): 0ae61fe
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - bsu-slim/electra-tiny
7
+ - lgcharpe/ELC_BERT_small_baby_10M
8
+ pipeline_tag: text-classification
9
+ library_name: transformers
10
+ ---
11
+ # This model is currently experimental and broken!
12
+
13
+ A pretrained [ELECTRA-Tiny](https://huggingface.co/bsu-slim/electra-tiny/tree/main) model modified to implement zero initialization
14
+ transformer layer weighting as described in
15
+ [Not all layers are equally as important: Every Layer Counts BERT](https://aclanthology.org/2023.conll-babylm.20.pdf).
16
+
17
+
18
+ # Training
19
+ Used pretraining pipeline as defined in this [repository](https://github.com/bakirgrbic/bblm).
20
+
21
+ ## Hyperparameters
22
+ - Epochs: 9
23
+ - Batch size: 8
24
+ - Learning rate: 1e-4
25
+ - Optimizer: AdamW
26
+
27
+ ## Resources Used
28
+ - Compute: AWS Sagemaker ml.g4dn.xlarge
29
+ - Time: About 63 hours
30
+
31
+
32
+ # Evaluation
33
+
34
+ ## BLiMP
35
+ Used BLiMP evaluation from the [2024 BabyLM evaluation pipeline repository](https://github.com/babylm/evaluation-pipeline-2024).
36
+
37
+ ### Results
38
+ - blimp_supplement accuracy: 47.54%
39
+ - blimp_filtered accuracy: 51.79%
40
+ - See [blimp_results](./blimp_results) for a detailed breakdown on subtasks.
41
+
42
+ ### Hyperparameters
43
+ - Epochs: 1
44
+ - Script modified for masked LMs
45
+
46
+ ### Resources Used
47
+ - Compute: arm64 MacOS
48
+ - Time: About 30 minutes