TinyBERT paper results for SST2

#1
by Dipti - opened

Hello Sayantan Dasgupta,

I'm a student trying to reproduce the results of the task specific distillation of TinyBert for SST2. I use the general distilled model from HuggingFace (Acc 58% on SST as per the glue script). But when I perform the intermediate and prediction layer distillations, the accuracy drops to 50%.
I use the Fine tuned BERT from https://huggingface.co/JeremiahZ/bert-base-uncased-sst2 (accuracy ~92%) and the training commands as per the TinyBERT repository. Are there any differences in the training pipeline of your models like the use of a learning rate scheduler or a better BERT model to distill from?

Any advice would be very helpful.

Best,
Dipti

Sign up or log in to comment