bakirgrbic
/

electra-tiny

Text Classification

Model card Files Files and versions

electra-tiny / README.md

bakirgrbic's picture

Update README.md

cfd599b verified 8 months ago

|

history blame contribute delete

1.66 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- bsu-slim/electra-tiny
	pipeline_tag: text-classification
	library_name: transformers
	---

	A pretrained [ELECTRA-Tiny](https://huggingface.co/bsu-slim/electra-tiny/tree/main) model. Used pretraining [data](https://osf.io/5mk3x)
	from the [2024 BabyLM Challenge](https://babylm.github.io/index.html). Used to perform text classification
	on the [Web of Science Dataset WOS-46985](https://data.mendeley.com/datasets/9rw3vkcfy4/6) but this model is not currently finetuned
	for that task. Also evaluated on BLiMP using the [2024 BabyLM evaluation pipeline](https://github.com/babylm/evaluation-pipeline-2024).


	# Training
	Used pretraining pipeline as defined in this [repository](https://github.com/bakirgrbic/bblm).

	## Hyperparameters
	- Epochs: 10
	- Batch size: 8
	- Learning rate: 1e-4
	- Optimizer: AdamW

	## Resources Used
	- Compute: AWS Sagemaker ml.g4dn.xlarge
	- Time: About 70 hours or 3 days


	# Evaluation

	## Web of Science (WOS)
	Used WOS pipeline as defined in this [repository](https://github.com/bakirgrbic/bblm).

	### Results
	- 76% accuracy on the last epoch of the test set.

	### Hyperparameters
	- Epochs: 3
	- Batch size: 64
	- Learning rate: 2e-5
	- Optimizer: AdamW
	- Max Length: 128
	- Parameter Freezing: None

	### Resources Used
	- Compute: AWS Sagemaker ml.g4dn.xlarge
	- Time: About 5 minutes


	## BLiMP

	### Results
	- blimp_supplement accuracy: 49.79%
	- blimp_filtered accuracy: 50.65%
	- See [blimp_results](./blimp_results) for a detailed breakdown on subtasks.

	### Hyperparameters
	- Epochs: 1
	- Script modified for masked LMs

	### Resources Used
	- Compute: arm64 MacOS
	- Time: About 1 hour