mapama247
/

DistilBERTa

Model card Files Files and versions

mapama247 commited on Dec 28, 2022

Commit

6284fca

·

1 Parent(s): 90f4cae

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -84,9 +84,7 @@ This model has been trained using a technique known as Knowledge Distillation, w
 It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
-So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model.
-As a result, the student has lower inference time and the ability to run in commodity hardware.
 ### Training data

 It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
+So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
 ### Training data