oopere
/

Qwen3.5-0.65B-Base-Rearchitected

@@ -22,7 +22,7 @@ datasets:
 ## Model Description
-This model is a surgically optimized and distilled version of **Qwen3.5-0.5B-Base-Rearchitected**,
 created with the techniques covered in **Chapter 6** in the book **"Rearchitecting LLMs"**.
 * **Book:** [Rearchitecting LLMs](https://hubs.la/Q040tvtp0)
@@ -30,12 +30,13 @@ created with the techniques covered in **Chapter 6** in the book **"Rearchitecti
 * **Technique:** Depth Pruning + Knowledge Distillation (Labels-Only with Skew KL Divergence)
 * **Chapter:** Chapter 6 - Knowledge Recovery
 ---
 ## Performance & Retention Metrics
-The goal of this optimization was to maximize parameter efficiency while maintaining the highest possible retention of the Teacher's capabilities.
 ### Retention Summary (vs Teacher Baseline)
 | Metric | Value | Description |
@@ -48,14 +49,17 @@ The goal of this optimization was to maximize parameter efficiency while maintai
 **Recovery** = How much of the pruning degradation was recovered through distillation.
-| Benchmark | Teacher | Pruned (No KD) | Student (After KD) | Recovery |
-|:---|:---:|:---:|:---:|:---:|
-| **Arc Easy** | 67.5% | 56.3% | 60.7% | 39.8% |
-| **Winogrande** | 59.4% | 55.5% | 55.9% | 9.9% |
-| **Hellaswag** | 54.9% | 44.0% | 47.2% | 29.6% |
-| **Lambada Openai** | 50.9% | 8.4% | 39.9% | 74.1% |
-| **Piqa** | 71.5% | 63.6% | 67.7% | 51.3% |
-| **Average** | 60.8% | 45.5% | 54.3% | 57.1% |
 ### Linguistic Quality
@@ -63,13 +67,19 @@ The goal of this optimization was to maximize parameter efficiency while maintai
 * **Teacher Baseline PPL:** 7.34
 * **Pruned (No KD) PPL:** 24.29
 ---
 ## Architecture Details
-* **Teacher Model:** `Qwen3.5-0.5B-Base-Rearchitected` (752,393,024 parameters)
 * **Student Model:** Pruned to (666,171,584 parameters)
-* **Layers Removed:** 4 layers (indices: [21, 20, 9, 22])
 * **Parameter Reduction:** 11.46%
 ---

 ## Model Description
+This model is a surgically optimized and distilled version of **Qwen3.5-0.8B-Base**,
 created with the techniques covered in **Chapter 6** in the book **"Rearchitecting LLMs"**.
 * **Book:** [Rearchitecting LLMs](https://hubs.la/Q040tvtp0)
 * **Technique:** Depth Pruning + Knowledge Distillation (Labels-Only with Skew KL Divergence)
 * **Chapter:** Chapter 6 - Knowledge Recovery
+[![linkedin-profile-banner-martra](https://cdn-uploads.huggingface.co/production/uploads/640f7924f2d7c41a1e9eced1/sa4ivCbm8kk6C9NAPmb-x.jpeg)](https://hubs.la/Q040tvsK0)
 ---
 ## Performance & Retention Metrics
+The goal of this optimization was twofold: to maximize parameter efficiency through structural pruning, and to perform a stylistic domain adaptation to the Cosmopedia dataset while retaining the Teacher's core reasoning capabilities.
 ### Retention Summary (vs Teacher Baseline)
 | Metric | Value | Description |
 **Recovery** = How much of the pruning degradation was recovered through distillation.
+| Benchmark | Teacher | Pruned (No KD) | (After KD) |
+|:---|:---:|:---:|:---:|
+| **Arc Easy** | 67.5% | 56.3% | 60.7% |
+| **Winogrande** | 59.4% | 55.5% | 55.9% |
+| **Hellaswag** | 54.9% | 44.0% | 47.2% |
+| **Lambada Openai** | 50.9% | 8.4% | 39.9% |
+| **Piqa** | 71.5% | 63.6% | 67.7% |
+| **Average** | 60.8% | 45.5% | 54.3% |
+![image](https://cdn-uploads.huggingface.co/production/uploads/640f7924f2d7c41a1e9eced1/FlaxH7EQBiFOBdk-fEpSN.png)
 ### Linguistic Quality
 * **Teacher Baseline PPL:** 7.34
 * **Pruned (No KD) PPL:** 24.29
+> **Note on Perplexity:** The Student achieves a lower (better) PPL than the Teacher. This highlights the **Domain Adaptation** effect of the distillation process. The Student successfully specialized in the tone and structure of the Cosmopedia training corpus, refining its style while recovering structural knowledge.
+![image](https://cdn-uploads.huggingface.co/production/uploads/640f7924f2d7c41a1e9eced1/2CDSSYlVJib7nHW84PyIY.png)
 ---
 ## Architecture Details
+* **Teacher Model:** `Qwen3.5-0.8B-Base` (752,393,024 parameters)
 * **Student Model:** Pruned to (666,171,584 parameters)
+* **Layers Removed:** 4 Tranformer blocks removed (indices: [21, 20, 9, 22])
 * **Parameter Reduction:** 11.46%
 ---