--- license: apache-2.0 language: - en base_model: - bsu-slim/electra-tiny - lgcharpe/ELC_BERT_small_baby_10M pipeline_tag: text-classification library_name: transformers --- # This model is currently experimental and broken! A pretrained [ELECTRA-Tiny](https://huggingface.co/bsu-slim/electra-tiny/tree/main) model modified to implement zero initialization transformer layer weighting as described in [Not all layers are equally as important: Every Layer Counts BERT](https://aclanthology.org/2023.conll-babylm.20.pdf). # Training Used pretraining pipeline as defined in this [repository](https://github.com/bakirgrbic/bblm). ## Hyperparameters - Epochs: 9 - Batch size: 8 - Learning rate: 1e-4 - Optimizer: AdamW ## Resources Used - Compute: AWS Sagemaker ml.g4dn.xlarge - Time: About 63 hours # Evaluation ## BLiMP Used BLiMP evaluation from the [2024 BabyLM evaluation pipeline repository](https://github.com/babylm/evaluation-pipeline-2024). ### Results - blimp_supplement accuracy: 47.54% - blimp_filtered accuracy: 51.79% - See [blimp_results](./blimp_results) for a detailed breakdown on subtasks. ### Hyperparameters - Epochs: 1 - Script modified for masked LMs ### Resources Used - Compute: arm64 MacOS - Time: About 30 minutes