EnricoFermi commited on
Commit
49cffac
·
verified ·
1 Parent(s): f8e8bd4

card: footer v3 — OLMoE shipped + §4.1.3.4.1 discipline gate

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -128,10 +128,11 @@ The Factory configurator lets you design and forge custom models visually — co
128
  |---|---|---|---|---|
129
  | [**qwen3-coder-30b-a3b-compacted-19b-256k**](https://huggingface.co/continuum-ai/qwen3-coder-30b-a3b-compacted-19b-256k) | Qwen3-Coder-30B-A3B-Instruct | **88.4** (base 92.1, Δ −3.7) | **12 GB Q4_K_M** | First 30B-class coder that fits a 12 GB consumer GPU. Calibration-aware MoE expert pruning (§4.1.3.4). 256K context. |
130
  | [**qwen2.5-coder-7b-compacted**](https://huggingface.co/continuum-ai/qwen2.5-coder-7b-compacted) | Qwen2.5-Coder-7B | 61.0 (base 62.2, Δ −1.2) | 16 GB fp16 | Methodology validation artifact for §4.1.3.3 — compensation LoRA closes the dense-head pruning gap to within ±3pt of base. |
 
131
 
132
  ### Forge methodology in one paragraph
133
 
134
- A prunable unit's importance MUST be derived from **task-conditioned activation profiling on a held-out corpus** that reflects the artifact's intended workload. Architectural-only metrics (router gate norms, weight norms, magnitudes) are first-pass shortcuts that systematically underperform task-specific activation metrics — empirically validated at two structurally distinct units (dense heads in §4.1.3.1, MoE experts in §4.1.3.4) with a +9.7 HumanEval swing on the same prune budget. **Get the metric right; the artifact follows.** Full methodology in [PLASTICITY-COMPACTION.md](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION.md).
135
 
136
  ### The empty-quadrant frontier
137
 
@@ -141,7 +142,7 @@ A live HuggingFace audit (April 2026) confirmed that **the entire structurally-p
141
 
142
  | # | Target | Arch | License | Total/Active | Tier post-prune | Status |
143
  |---|---|---|---|---|---|---|
144
- | 1 | [allenai/OLMoE-1B-7B-0924-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct) | `OlmoeForCausalLM` | Apache-2.0 | 7B/1.3B (64e/top-8) | Phone / 4 GB | **Downloading now.** Smallest serious MoE on HF, fully-open (data + checkpoints), zero pruned variants. |
145
  | 2 | [ibm-granite/granite-3.1-3b-a800m-instruct](https://huggingface.co/ibm-granite/granite-3.1-3b-a800m-instruct) | `GraniteMoeForCausalLM` | Apache-2.0 | 3.3B/800M (40e/top-8) | Edge tier | **Downloading now.** IBM enterprise brand, ultra-rare tiny-MoE niche, zero pruned variants. |
146
  | 3 | [deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat) | `DeepseekV2ForCausalLM` | DeepSeek (commercial OK) | 15.7B/2.4B | Single GPU | **Downloading now.** The forgotten DeepSeek sibling — DeepSeek brand without 670 GB of VRAM. |
147
  | 4 | [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) | `PhiMoEForCausalLM` | **MIT** | 42B/6.6B (16e/top-2) | Single 5090 Q4 | Queued. MIT-licensed Microsoft MoE that nobody runs because 42B is the awkward middle tier — until you prune to 12 experts. |
 
128
  |---|---|---|---|---|
129
  | [**qwen3-coder-30b-a3b-compacted-19b-256k**](https://huggingface.co/continuum-ai/qwen3-coder-30b-a3b-compacted-19b-256k) | Qwen3-Coder-30B-A3B-Instruct | **88.4** (base 92.1, Δ −3.7) | **12 GB Q4_K_M** | First 30B-class coder that fits a 12 GB consumer GPU. Calibration-aware MoE expert pruning (§4.1.3.4). 256K context. |
130
  | [**qwen2.5-coder-7b-compacted**](https://huggingface.co/continuum-ai/qwen2.5-coder-7b-compacted) | Qwen2.5-Coder-7B | 61.0 (base 62.2, Δ −1.2) | 16 GB fp16 | Methodology validation artifact for §4.1.3.3 — compensation LoRA closes the dense-head pruning gap to within ±3pt of base. |
131
+ | [**olmoe-1b-7b-compacted-5b**](https://huggingface.co/continuum-ai/olmoe-1b-7b-compacted-5b) | OLMoE-1B-7B-0924-Instruct (Allen AI, fully open) | **36.0** (base 40.9, Δ −4.9) | **4 GB Q5_K_M / phone tier** | Cross-architecture validation of §4.1.3.4 — same forge scripts ported `Qwen3MoeForCausalLM` → `OlmoeForCausalLM` without modification. The +8.0 within-model swing between broad-corpus and code-corpus calibration is the second empirical anchor for the discipline gate. |
132
 
133
  ### Forge methodology in one paragraph
134
 
135
+ A prunable unit's importance MUST be derived from **task-conditioned activation profiling on a held-out corpus** that reflects the artifact's intended workload. Architectural-only metrics (router gate norms, weight norms, magnitudes) are first-pass shortcuts that systematically underperform task-specific activation metrics — empirically validated at two structurally distinct units (dense heads in §4.1.3.1, MoE experts in §4.1.3.4) with a +9.7 HumanEval swing on the same prune budget. **Get the metric right AND the calibration corpus right; the artifact follows.** Two discipline gates now derived from empirical failures, not asserted from first principles: **§4.1.4.1 anchor-reproduction gate** (the base anchor must reproduce within ±3pt on the publishing pipeline before any calibrated delta is reported), and **§4.1.3.4.1 calibration-corpus discipline gate** (the calibration corpus used for importance profiling must be hash-pinned in the alloy AND must be a representative sample of the eval workload distribution — wrong-corpus and wrong-metric saturate at the same ~13 HumanEval damage ceiling, demonstrated empirically across two architectures). Full methodology in [PLASTICITY-COMPACTION.md](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION.md).
136
 
137
  ### The empty-quadrant frontier
138
 
 
142
 
143
  | # | Target | Arch | License | Total/Active | Tier post-prune | Status |
144
  |---|---|---|---|---|---|---|
145
+ | 1 | OLMoE-1B-7B (`OlmoeForCausalLM`) | `OlmoeForCausalLM` | Apache-2.0 | 7B/1.3B → 5B/1.0B | **Phone / 4 GB Q5** | **SHIPPED** as `olmoe-1b-7b-compacted-5b`. Second cross-arch validation of §4.1.3.4. |
146
  | 2 | [ibm-granite/granite-3.1-3b-a800m-instruct](https://huggingface.co/ibm-granite/granite-3.1-3b-a800m-instruct) | `GraniteMoeForCausalLM` | Apache-2.0 | 3.3B/800M (40e/top-8) | Edge tier | **Downloading now.** IBM enterprise brand, ultra-rare tiny-MoE niche, zero pruned variants. |
147
  | 3 | [deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat) | `DeepseekV2ForCausalLM` | DeepSeek (commercial OK) | 15.7B/2.4B | Single GPU | **Downloading now.** The forgotten DeepSeek sibling — DeepSeek brand without 670 GB of VRAM. |
148
  | 4 | [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) | `PhiMoEForCausalLM` | **MIT** | 42B/6.6B (16e/top-2) | Single 5090 Q4 | Queued. MIT-licensed Microsoft MoE that nobody runs because 42B is the awkward middle tier — until you prune to 12 experts. |