Update README.md
Browse files
README.md
CHANGED
|
@@ -84,9 +84,7 @@ This model has been trained using a technique known as Knowledge Distillation, w
|
|
| 84 |
|
| 85 |
It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
|
| 86 |
|
| 87 |
-
So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model.
|
| 88 |
-
|
| 89 |
-
As a result, the student has lower inference time and the ability to run in commodity hardware.
|
| 90 |
|
| 91 |
### Training data
|
| 92 |
|
|
|
|
| 84 |
|
| 85 |
It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
|
| 86 |
|
| 87 |
+
So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
|
|
|
|
|
|
|
| 88 |
|
| 89 |
### Training data
|
| 90 |
|