HRA / nlu /DeBERTa /sift /README.md
nvan13's picture
Add files using upload-large-folder tool
ab0f6ec verified

SiFT (Scale Invariant Fine-Tuning)

Usage

For example to try SiFT in DeBERTa, please check experiments/glue/mnli.sh base-sift or experiments/glue/mnli.sh xxlarge-v2-sift

Here is an example to consume SiFT in your existing code,

# Create DeBERTa model
adv_modules = hook_sift_layer(model, hidden_size=768)
adv = AdversarialLearner(model, adv_modules)
def logits_fn(model, *wargs, **kwargs):
  logits,_ = model(*wargs, **kwargs)
  return logits
logits,loss = model(**data)

loss = loss + adv.loss(logits, logits_fn, **data)
# Other steps is the same as general training.

Ablation study results

Model MNLI-m/mm SST-2 QNLI CoLA RTE MRPC QQP STS-B
Acc Acc Acc MCC Acc Acc/F1 Acc/F1 P/S
DeBERTa-V2-XXLarge1,2 91.7/91.9 97.2 96.0 72.0 93.5 93.1/94.9 92.7/90.3 93.2/93.1
DeBERTa-V2-XXLarge1,2 92.0/92.1 97.5 96.5 73.5 96.5 - 93.0/90.7 -

Citation

@misc{he2020deberta,
    title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention},
    author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
    year={2020},
    eprint={2006.03654},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

@article{Jiang_2020,
   title={SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization},
   url={http://dx.doi.org/10.18653/v1/2020.acl-main.197},
   DOI={10.18653/v1/2020.acl-main.197},
   journal={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
   publisher={Association for Computational Linguistics},
   author={Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Zhao, Tuo},
   year={2020}
}