mapama247 commited on
Commit
6284fca
·
1 Parent(s): 90f4cae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -84,9 +84,7 @@ This model has been trained using a technique known as Knowledge Distillation, w
84
 
85
  It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
86
 
87
- So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model.
88
-
89
- As a result, the student has lower inference time and the ability to run in commodity hardware.
90
 
91
  ### Training data
92
 
 
84
 
85
  It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
86
 
87
+ So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
 
 
88
 
89
  ### Training data
90