mapama247
/

DistilBERTa

Model card Files Files and versions

mapama247 commited on Dec 27, 2022

Commit

7220a2f

·

1 Parent(s): e5ae76b

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -56,6 +56,13 @@ The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (c
 We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
 ## Intended uses and limitations
 This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.

 We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
+**About Knowledge Distiallation**
+It is a technique used to shrink networks to a reasonable size while minimizing the loss in performance.
+The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student). So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
+As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.
 ## Intended uses and limitations
 This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.