metadata
license: apache-2.0
language:
- en
base_model:
- bsu-slim/electra-tiny
- lgcharpe/ELC_BERT_small_baby_10M
pipeline_tag: text-classification
library_name: transformers
This model is currently experimental and broken!
A pretrained ELECTRA-Tiny model modified to implement zero initialization transformer layer weighting as described in Not all layers are equally as important: Every Layer Counts BERT.
Training
Used pretraining pipeline as defined in this repository.
Hyperparameters
- Epochs: 9
- Batch size: 8
- Learning rate: 1e-4
- Optimizer: AdamW
Resources Used
- Compute: AWS Sagemaker ml.g4dn.xlarge
- Time: About 63 hours
Evaluation
BLiMP
Used BLiMP evaluation from the 2024 BabyLM evaluation pipeline repository.
Results
- blimp_supplement accuracy: 47.54%
- blimp_filtered accuracy: 51.79%
- See blimp_results for a detailed breakdown on subtasks.
Hyperparameters
- Epochs: 1
- Script modified for masked LMs
Resources Used
- Compute: arm64 MacOS
- Time: About 30 minutes