TinyBERT paper results for SST2
#1
by
Dipti
- opened
Hello Sayantan Dasgupta,
I'm a student trying to reproduce the results of the task specific distillation of TinyBert for SST2. I use the general distilled model from HuggingFace (Acc 58% on SST as per the glue script). But when I perform the intermediate and prediction layer distillations, the accuracy drops to 50%.
I use the Fine tuned BERT from https://huggingface.co/JeremiahZ/bert-base-uncased-sst2 (accuracy ~92%) and the training commands as per the TinyBERT repository. Are there any differences in the training pipeline of your models like the use of a learning rate scheduler or a better BERT model to distill from?
Any advice would be very helpful.
Best,
Dipti