mapama247
/

DistilBERTa

Model card Files Files and versions

mapama247 commited on Dec 27, 2022

Commit

938eb16

·

1 Parent(s): 7220a2f

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -57,9 +57,12 @@ The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (c
 We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
 **About Knowledge Distiallation**
 It is a technique used to shrink networks to a reasonable size while minimizing the loss in performance.
-The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student). So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
 As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.

 We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
 **About Knowledge Distiallation**
 It is a technique used to shrink networks to a reasonable size while minimizing the loss in performance.
+The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
+So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
 As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.