continuum-ai
/

qwen2.5-coder-7b-compacted

@@ -128,10 +128,11 @@ The Factory configurator lets you design and forge custom models visually — co
 |---|---|---|---|---|
 | [**qwen3-coder-30b-a3b-compacted-19b-256k**](https://huggingface.co/continuum-ai/qwen3-coder-30b-a3b-compacted-19b-256k) | Qwen3-Coder-30B-A3B-Instruct | **88.4** (base 92.1, Δ −3.7) | **12 GB Q4_K_M** | First 30B-class coder that fits a 12 GB consumer GPU. Calibration-aware MoE expert pruning (§4.1.3.4). 256K context. |
 | [**qwen2.5-coder-7b-compacted**](https://huggingface.co/continuum-ai/qwen2.5-coder-7b-compacted) | Qwen2.5-Coder-7B | 61.0 (base 62.2, Δ −1.2) | 16 GB fp16 | Methodology validation artifact for §4.1.3.3 — compensation LoRA closes the dense-head pruning gap to within ±3pt of base. |
 ### Forge methodology in one paragraph
-A prunable unit's importance MUST be derived from **task-conditioned activation profiling on a held-out corpus** that reflects the artifact's intended workload. Architectural-only metrics (router gate norms, weight norms, magnitudes) are first-pass shortcuts that systematically underperform task-specific activation metrics — empirically validated at two structurally distinct units (dense heads in §4.1.3.1, MoE experts in §4.1.3.4) with a +9.7 HumanEval swing on the same prune budget. **Get the metric right; the artifact follows.** Full methodology in [PLASTICITY-COMPACTION.md](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION.md).
 ### The empty-quadrant frontier
@@ -141,7 +142,7 @@ A live HuggingFace audit (April 2026) confirmed that **the entire structurally-p
 | # | Target | Arch | License | Total/Active | Tier post-prune | Status |
 |---|---|---|---|---|---|---|
-| 1 | [allenai/OLMoE-1B-7B-0924-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct) | `OlmoeForCausalLM` | Apache-2.0 | 7B/1.3B (64e/top-8) | Phone / 4 GB | **Downloading now.** Smallest serious MoE on HF, fully-open (data + checkpoints), zero pruned variants. |
 | 2 | [ibm-granite/granite-3.1-3b-a800m-instruct](https://huggingface.co/ibm-granite/granite-3.1-3b-a800m-instruct) | `GraniteMoeForCausalLM` | Apache-2.0 | 3.3B/800M (40e/top-8) | Edge tier | **Downloading now.** IBM enterprise brand, ultra-rare tiny-MoE niche, zero pruned variants. |
 | 3 | [deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat) | `DeepseekV2ForCausalLM` | DeepSeek (commercial OK) | 15.7B/2.4B | Single GPU | **Downloading now.** The forgotten DeepSeek sibling — DeepSeek brand without 670 GB of VRAM. |
 | 4 | [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) | `PhiMoEForCausalLM` | **MIT** | 42B/6.6B (16e/top-2) | Single 5090 Q4 | Queued. MIT-licensed Microsoft MoE that nobody runs because 42B is the awkward middle tier — until you prune to 12 experts. |

 |---|---|---|---|---|
 | [**qwen3-coder-30b-a3b-compacted-19b-256k**](https://huggingface.co/continuum-ai/qwen3-coder-30b-a3b-compacted-19b-256k) | Qwen3-Coder-30B-A3B-Instruct | **88.4** (base 92.1, Δ −3.7) | **12 GB Q4_K_M** | First 30B-class coder that fits a 12 GB consumer GPU. Calibration-aware MoE expert pruning (§4.1.3.4). 256K context. |
 | [**qwen2.5-coder-7b-compacted**](https://huggingface.co/continuum-ai/qwen2.5-coder-7b-compacted) | Qwen2.5-Coder-7B | 61.0 (base 62.2, Δ −1.2) | 16 GB fp16 | Methodology validation artifact for §4.1.3.3 — compensation LoRA closes the dense-head pruning gap to within ±3pt of base. |
+| [**olmoe-1b-7b-compacted-5b**](https://huggingface.co/continuum-ai/olmoe-1b-7b-compacted-5b) | OLMoE-1B-7B-0924-Instruct (Allen AI, fully open) | **36.0** (base 40.9, Δ −4.9) | **4 GB Q5_K_M / phone tier** | Cross-architecture validation of §4.1.3.4 — same forge scripts ported `Qwen3MoeForCausalLM` → `OlmoeForCausalLM` without modification. The +8.0 within-model swing between broad-corpus and code-corpus calibration is the second empirical anchor for the discipline gate. |
 ### Forge methodology in one paragraph
+A prunable unit's importance MUST be derived from **task-conditioned activation profiling on a held-out corpus** that reflects the artifact's intended workload. Architectural-only metrics (router gate norms, weight norms, magnitudes) are first-pass shortcuts that systematically underperform task-specific activation metrics — empirically validated at two structurally distinct units (dense heads in §4.1.3.1, MoE experts in §4.1.3.4) with a +9.7 HumanEval swing on the same prune budget. **Get the metric right AND the calibration corpus right; the artifact follows.** Two discipline gates now derived from empirical failures, not asserted from first principles: **§4.1.4.1 anchor-reproduction gate** (the base anchor must reproduce within ±3pt on the publishing pipeline before any calibrated delta is reported), and **§4.1.3.4.1 calibration-corpus discipline gate** (the calibration corpus used for importance profiling must be hash-pinned in the alloy AND must be a representative sample of the eval workload distribution — wrong-corpus and wrong-metric saturate at the same ~13 HumanEval damage ceiling, demonstrated empirically across two architectures). Full methodology in [PLASTICITY-COMPACTION.md](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION.md).
 ### The empty-quadrant frontier
 | # | Target | Arch | License | Total/Active | Tier post-prune | Status |
 |---|---|---|---|---|---|---|
+| 1 | OLMoE-1B-7B (`OlmoeForCausalLM`) | `OlmoeForCausalLM` | Apache-2.0 | 7B/1.3B → 5B/1.0B | **Phone / 4 GB Q5** | ✅ **SHIPPED** as `olmoe-1b-7b-compacted-5b`. Second cross-arch validation of §4.1.3.4. |
 | 2 | [ibm-granite/granite-3.1-3b-a800m-instruct](https://huggingface.co/ibm-granite/granite-3.1-3b-a800m-instruct) | `GraniteMoeForCausalLM` | Apache-2.0 | 3.3B/800M (40e/top-8) | Edge tier | **Downloading now.** IBM enterprise brand, ultra-rare tiny-MoE niche, zero pruned variants. |
 | 3 | [deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat) | `DeepseekV2ForCausalLM` | DeepSeek (commercial OK) | 15.7B/2.4B | Single GPU | **Downloading now.** The forgotten DeepSeek sibling — DeepSeek brand without 670 GB of VRAM. |
 | 4 | [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) | `PhiMoEForCausalLM` | **MIT** | 42B/6.6B (16e/top-2) | Single 5090 Q4 | Queued. MIT-licensed Microsoft MoE that nobody runs because 42B is the awkward middle tier — until you prune to 12 experts. |