TinyBERT paper results for SST2

by Dipti - opened May 22, 2023

May 22, 2023

Hello Sayantan Dasgupta,

I'm a student trying to reproduce the results of the task specific distillation of TinyBert for SST2. I use the general distilled model from HuggingFace (Acc 58% on SST as per the glue script). But when I perform the intermediate and prediction layer distillations, the accuracy drops to 50%.
I use the Fine tuned BERT from https://huggingface.co/JeremiahZ/bert-base-uncased-sst2 (accuracy ~92%) and the training commands as per the TinyBERT repository. Are there any differences in the training pipeline of your models like the use of a learning rate scheduler or a better BERT model to distill from?

Any advice would be very helpful.

Best,
Dipti

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment