A pretrained ELECTRA-Tiny model. Used pretraining data from the 2024 BabyLM Challenge. Used to perform text classification on the Web of Science Dataset WOS-46985 but this model is not currently finetuned for that task. Also evaluated on BLiMP using the 2024 BabyLM evaluation pipeline.

Training

Used pretraining pipeline as defined in this repository.

Hyperparameters

  • Epochs: 10
  • Batch size: 8
  • Learning rate: 1e-4
  • Optimizer: AdamW

Resources Used

  • Compute: AWS Sagemaker ml.g4dn.xlarge
  • Time: About 70 hours or 3 days

Evaluation

Web of Science (WOS)

Used WOS pipeline as defined in this repository.

Results

  • 76% accuracy on the last epoch of the test set.

Hyperparameters

  • Epochs: 3
  • Batch size: 64
  • Learning rate: 2e-5
  • Optimizer: AdamW
  • Max Length: 128
  • Parameter Freezing: None

Resources Used

  • Compute: AWS Sagemaker ml.g4dn.xlarge
  • Time: About 5 minutes

BLiMP

Results

  • blimp_supplement accuracy: 49.79%
  • blimp_filtered accuracy: 50.65%
  • See blimp_results for a detailed breakdown on subtasks.

Hyperparameters

  • Epochs: 1
  • Script modified for masked LMs

Resources Used

  • Compute: arm64 MacOS
  • Time: About 1 hour
Downloads last month
63
Safetensors
Model size
5.75M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bakirgrbic/electra-tiny

Finetuned
(2)
this model