Oleg Lavrovsky
commited on
README links
Browse files
README.md
CHANGED
|
@@ -1,5 +1,20 @@
|
|
| 1 |
# Knowledge Distillation
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
Knowledge Distillation is a machine learning technique where a compact "student" model learns to replicate the behavior of a larger, more complex "teacher" model to achieve comparable performance with improved efficiency.
|
| 4 |
|
| 5 |
Model Optimizer's Distillation is a set of wrappers and utilities to easily perform Knowledge Distillation among teacher and student models. Given a pretrained teacher model, Distillation has the potential to train a smaller student model faster and/or with higher accuracy than the student model could achieve on its own.
|
|
|
|
| 1 |
# Knowledge Distillation
|
| 2 |
|
| 3 |
+
Source of this doc: https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill/README.md
|
| 4 |
+
Additional links:
|
| 5 |
+
|
| 6 |
+
- https://arxiv.org/abs/2601.14051
|
| 7 |
+
- https://arxiv.org/abs/2402.12030
|
| 8 |
+
- https://huggingface.co/docs/transformers/v4.56.2/en/model_doc/apertus
|
| 9 |
+
- https://medium.com/@gsaidheeraj/swiss-ais-apertus-70b-and-8b-a-complete-deep-dive-into-switzerland-s-revolutionary-open-language-90a88b904f6b
|
| 10 |
+
- https://huggingface.co/unsloth/Apertus-8B-Instruct-2509-GGUF
|
| 11 |
+
- https://huggingface.co/daslab-testing/Apertus-1.7B-it360000-SFT/blob/main/README.md
|
| 12 |
+
- https://www.emergentmind.com/papers/2509.14233
|
| 13 |
+
- https://huggingface.co/mistralai/Mistral-Nemo-Base-2407
|
| 14 |
+
- https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Mistral/Supervised_fine_tuning_(SFT)_of_an_LLM_using_Hugging_Face_tooling.ipynb/
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
Knowledge Distillation is a machine learning technique where a compact "student" model learns to replicate the behavior of a larger, more complex "teacher" model to achieve comparable performance with improved efficiency.
|
| 19 |
|
| 20 |
Model Optimizer's Distillation is a set of wrappers and utilities to easily perform Knowledge Distillation among teacher and student models. Given a pretrained teacher model, Distillation has the potential to train a smaller student model faster and/or with higher accuracy than the student model could achieve on its own.
|