electra-tiny-elc / README.md
bakirgrbic's picture
v1 done
e9b4790 verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - bsu-slim/electra-tiny
  - lgcharpe/ELC_BERT_small_baby_10M
pipeline_tag: text-classification
library_name: transformers

This model is currently experimental and broken!

A pretrained ELECTRA-Tiny model modified to implement zero initialization transformer layer weighting as described in Not all layers are equally as important: Every Layer Counts BERT.

Training

Used pretraining pipeline as defined in this repository.

Hyperparameters

  • Epochs: 9
  • Batch size: 8
  • Learning rate: 1e-4
  • Optimizer: AdamW

Resources Used

  • Compute: AWS Sagemaker ml.g4dn.xlarge
  • Time: About 63 hours

Evaluation

BLiMP

Used BLiMP evaluation from the 2024 BabyLM evaluation pipeline repository.

Results

  • blimp_supplement accuracy: 47.54%
  • blimp_filtered accuracy: 51.79%
  • See blimp_results for a detailed breakdown on subtasks.

Hyperparameters

  • Epochs: 1
  • Script modified for masked LMs

Resources Used

  • Compute: arm64 MacOS
  • Time: About 30 minutes