This model is currently experimental and broken!

A pretrained ELECTRA-Tiny model modified to implement zero initialization transformer layer weighting as described in Not all layers are equally as important: Every Layer Counts BERT.

Training

Used pretraining pipeline as defined in this repository.

Hyperparameters

  • Epochs: 9
  • Batch size: 8
  • Learning rate: 1e-4
  • Optimizer: AdamW

Resources Used

  • Compute: AWS Sagemaker ml.g4dn.xlarge
  • Time: About 63 hours

Evaluation

BLiMP

Used BLiMP evaluation from the 2024 BabyLM evaluation pipeline repository.

Results

  • blimp_supplement accuracy: 47.54%
  • blimp_filtered accuracy: 51.79%
  • See blimp_results for a detailed breakdown on subtasks.

Hyperparameters

  • Epochs: 1
  • Script modified for masked LMs

Resources Used

  • Compute: arm64 MacOS
  • Time: About 30 minutes
Downloads last month
-
Safetensors
Model size
5.75M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bakirgrbic/electra-tiny-elc

Finetuned
(2)
this model