spiu
/

distill-test

Oleg Lavrovsky commited on Jan 21

Commit

8bb2e6f

unverified ·

1 Parent(s): 7b45378

README links

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,5 +1,20 @@
 # Knowledge Distillation
 Knowledge Distillation is a machine learning technique where a compact "student" model learns to replicate the behavior of a larger, more complex "teacher" model to achieve comparable performance with improved efficiency.
 Model Optimizer's Distillation is a set of wrappers and utilities to easily perform Knowledge Distillation among teacher and student models. Given a pretrained teacher model, Distillation has the potential to train a smaller student model faster and/or with higher accuracy than the student model could achieve on its own.

 # Knowledge Distillation
+Source of this doc: https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill/README.md
+Additional links:
+- https://arxiv.org/abs/2601.14051
+- https://arxiv.org/abs/2402.12030
+- https://huggingface.co/docs/transformers/v4.56.2/en/model_doc/apertus
+- https://medium.com/@gsaidheeraj/swiss-ais-apertus-70b-and-8b-a-complete-deep-dive-into-switzerland-s-revolutionary-open-language-90a88b904f6b
+- https://huggingface.co/unsloth/Apertus-8B-Instruct-2509-GGUF
+- https://huggingface.co/daslab-testing/Apertus-1.7B-it360000-SFT/blob/main/README.md
+- https://www.emergentmind.com/papers/2509.14233
+- https://huggingface.co/mistralai/Mistral-Nemo-Base-2407
+- https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Mistral/Supervised_fine_tuning_(SFT)_of_an_LLM_using_Hugging_Face_tooling.ipynb/
+---
 Knowledge Distillation is a machine learning technique where a compact "student" model learns to replicate the behavior of a larger, more complex "teacher" model to achieve comparable performance with improved efficiency.
 Model Optimizer's Distillation is a set of wrappers and utilities to easily perform Knowledge Distillation among teacher and student models. Given a pretrained teacher model, Distillation has the potential to train a smaller student model faster and/or with higher accuracy than the student model could achieve on its own.