Cross-link: DistilQwen collection spotlight — 2026-03-29
Browse files
README.md
CHANGED
|
@@ -173,7 +173,7 @@ If you use TRL in your work, please cite the library:
|
|
| 173 |
|
| 174 |
## From the Convergent Intelligence Portfolio
|
| 175 |
|
| 176 |
-
**[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** — Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads.
|
| 177 |
|
| 178 |
Top model: [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) — 508 downloads
|
| 179 |
|
|
|
|
| 173 |
|
| 174 |
## From the Convergent Intelligence Portfolio
|
| 175 |
|
| 176 |
+
**[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** — Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
|
| 177 |
|
| 178 |
Top model: [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) — 508 downloads
|
| 179 |
|