| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - bsu-slim/electra-tiny |
| | - lgcharpe/ELC_BERT_small_baby_10M |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | --- |
| | # This model is currently experimental and broken! |
| |
|
| | A pretrained [ELECTRA-Tiny](https://huggingface.co/bsu-slim/electra-tiny/tree/main) model modified to implement zero initialization |
| | transformer layer weighting as described in |
| | [Not all layers are equally as important: Every Layer Counts BERT](https://aclanthology.org/2023.conll-babylm.20.pdf). |
| |
|
| |
|
| | # Training |
| | Used pretraining pipeline as defined in this [repository](https://github.com/bakirgrbic/bblm). |
| |
|
| | ## Hyperparameters |
| | - Epochs: 9 |
| | - Batch size: 8 |
| | - Learning rate: 1e-4 |
| | - Optimizer: AdamW |
| |
|
| | ## Resources Used |
| | - Compute: AWS Sagemaker ml.g4dn.xlarge |
| | - Time: About 63 hours |
| |
|
| |
|
| | # Evaluation |
| |
|
| | ## BLiMP |
| | Used BLiMP evaluation from the [2024 BabyLM evaluation pipeline repository](https://github.com/babylm/evaluation-pipeline-2024). |
| |
|
| | ### Results |
| | - blimp_supplement accuracy: 47.54% |
| | - blimp_filtered accuracy: 51.79% |
| | - See [blimp_results](./blimp_results) for a detailed breakdown on subtasks. |
| |
|
| | ### Hyperparameters |
| | - Epochs: 1 |
| | - Script modified for masked LMs |
| |
|
| | ### Resources Used |
| | - Compute: arm64 MacOS |
| | - Time: About 30 minutes |