mapama247 commited on
Commit
7220a2f
·
1 Parent(s): e5ae76b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -56,6 +56,13 @@ The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (c
56
 
57
  We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
58
 
 
 
 
 
 
 
 
59
  ## Intended uses and limitations
60
 
61
  This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
 
56
 
57
  We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
58
 
59
+ **About Knowledge Distiallation**
60
+ It is a technique used to shrink networks to a reasonable size while minimizing the loss in performance.
61
+
62
+ The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student). So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
63
+
64
+ As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.
65
+
66
  ## Intended uses and limitations
67
 
68
  This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.