EnricoFermi commited on
Commit
6ab6114
·
verified ·
1 Parent(s): a2e77c7

Regenerate model card from corrected alloy (alloyHash 4fe422e9b01fa8f0)

Browse files
Files changed (1) hide show
  1. README.md +67 -72
README.md CHANGED
@@ -1,15 +1,71 @@
1
  ---
2
  tags:
3
- - text-generation
4
- - general
5
- - qwen2.5
6
  - 7b
7
- - pruned
8
- - lora
 
 
 
 
 
 
 
 
 
 
 
9
  - compensation-lora
 
 
 
 
10
  - distillation
 
 
 
 
11
  - forge-alloy
12
- - cryptographically-verified
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  base_model: Qwen/Qwen2.5-Coder-7B
14
  pipeline_tag: text-generation
15
  license: apache-2.0
@@ -17,20 +73,20 @@ license: apache-2.0
17
 
18
  # 12% Pruned, 61.0 HUMANEVAL (base 62.2)
19
 
20
- **Qwen2.5-Coder-7B** forged through Experiential Plasticity and recovered to within calibration tolerance of the unmodified base via KL-distillation compensation LoRA.
21
 
22
  - **HUMANEVAL**: 61.0 (base 62.2, Δ -1.2)
23
  - **HUMANEVAL+PLUS**: 53.0 (base 53.7, Δ -0.7)
24
 
25
 
26
  <p align="center">
27
- <a href="https://cambriantech.github.io/forge-alloy/verify/#c92083286a04544b">
28
  <img src="alloy-qr.png" alt="Verify Chain of Custody" width="160"/>
29
  </a>
30
  </p>
31
 
32
  <p align="center">
33
- <a href="https://cambriantech.github.io/forge-alloy/verify/#c92083286a04544b"><b>Every claim on this card is verified</b></a><br>
34
  <b>Trust: self-attested</b> · 2 benchmarks · 1 device tested<br>
35
  <a href="https://github.com/CambrianTech/forge-alloy">ForgeAlloy</a> chain of custody · <a href="v2-7b-coder-compensated.alloy.json">Download alloy</a> · Merkle-chained
36
  </p>
@@ -95,10 +151,11 @@ Produced via head pruning, LoRA fine-tuning, KL-distillation compensation agains
95
 
96
  ## Chain of Custody
97
 
98
- Scan the QR or [verify online](https://cambriantech.github.io/forge-alloy/verify/#c92083286a04544b). Download the [alloy file](v2-7b-coder-compensated.alloy.json) to verify independently.
99
 
100
  | What | Proof |
101
  |------|-------|
 
102
  | Forged on | NVIDIA GeForce RTX 5090, ? |
103
  | Published | [huggingface](https://huggingface.co/continuum-ai/v2-7b-coder-compensated) — 2026-04-08T05:02:57.072577+00:00 |
104
  | Trust level | [`self-attested`](https://github.com/CambrianTech/forge-alloy/blob/main/docs/ATTESTATION.md) |
@@ -116,68 +173,6 @@ The Factory configurator lets you design and forge custom models visually — co
116
 
117
  [GitHub](https://github.com/CambrianTech/continuum) · [All Models](https://huggingface.co/continuum-ai) · [Forge-Alloy](https://github.com/CambrianTech/forge-alloy)
118
 
119
- ---
120
-
121
- ## More from continuum-ai
122
-
123
- `continuum-ai` ships **structurally compacted models for hardware tiers nobody else targets**. Every artifact is calibration-aware, hardware-anchored, and shipped with [ForgeAlloy](https://github.com/CambrianTech/forge-alloy) cryptographic provenance — the per-problem benchmark JSONLs are uploaded with sha256 hashes recorded in the alloy so anyone can re-score against the same anchor without trusting the producer's claim.
124
-
125
- ### Currently shipped
126
-
127
- | Model | Base | HumanEval (vs base) | Tier | What's new |
128
- |---|---|---|---|---|
129
- | [**qwen3-coder-30b-a3b-compacted-19b-256k**](https://huggingface.co/continuum-ai/qwen3-coder-30b-a3b-compacted-19b-256k) | Qwen3-Coder-30B-A3B-Instruct | **88.4** (base 92.1, Δ −3.7) | **12 GB Q4_K_M** | First 30B-class coder that fits a 12 GB consumer GPU. Calibration-aware MoE expert pruning (§4.1.3.4). 256K context. |
130
- | [**qwen2.5-coder-7b-compacted**](https://huggingface.co/continuum-ai/qwen2.5-coder-7b-compacted) | Qwen2.5-Coder-7B | 61.0 (base 62.2, Δ −1.2) | 16 GB fp16 | Methodology validation artifact for §4.1.3.3 — compensation LoRA closes the dense-head pruning gap to within ±3pt of base. |
131
- | [**olmoe-1b-7b-compacted-5b**](https://huggingface.co/continuum-ai/olmoe-1b-7b-compacted-5b) | OLMoE-1B-7B-0924-Instruct (Allen AI, fully open) | **36.0** (base 40.9, Δ −4.9) | **4 GB Q5_K_M / phone tier** | Cross-architecture validation of §4.1.3.4 — same forge scripts ported `Qwen3MoeForCausalLM` → `OlmoeForCausalLM` without modification. The +8.0 within-model swing between broad-corpus and code-corpus calibration is the second empirical anchor for the discipline gate. |
132
-
133
- ### Forge methodology in one paragraph
134
-
135
- A prunable unit's importance MUST be derived from **task-conditioned activation profiling on a held-out corpus** that reflects the artifact's intended workload. Architectural-only metrics (router gate norms, weight norms, magnitudes) are first-pass shortcuts that systematically underperform task-specific activation metrics — empirically validated at two structurally distinct units (dense heads in §4.1.3.1, MoE experts in §4.1.3.4) with a +9.7 HumanEval swing on the same prune budget. **Get the metric right AND the calibration corpus right; the artifact follows.** Two discipline gates now derived from empirical failures, not asserted from first principles: **§4.1.4.1 anchor-reproduction gate** (the base anchor must reproduce within ±3pt on the publishing pipeline before any calibrated delta is reported), and **§4.1.3.4.1 calibration-corpus discipline gate** (the calibration corpus used for importance profiling must be hash-pinned in the alloy AND must be a representative sample of the eval workload distribution — wrong-corpus and wrong-metric saturate at the same ~13 HumanEval damage ceiling, demonstrated empirically across two architectures). Full methodology in [PLASTICITY-COMPACTION.md](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION.md).
136
-
137
- ### The empty-quadrant frontier
138
-
139
- A live HuggingFace audit (April 2026) confirmed that **the entire structurally-pruned-MoE quadrant is empty for every frontier model except Llama 3.3 70B**. Quantization is everywhere; structural pruning is nowhere. The forge methodology validated on `qwen3-coder-30b-a3b` ports directly to every other MoE family. The forge queue below is the comprehensive map of empty quadrants we are claiming, one architecture at a time.
140
-
141
- ### Forge queue — comprehensive new-architecture coverage
142
-
143
- | # | Target | Arch | License | Total/Active | Tier post-prune | Status |
144
- |---|---|---|---|---|---|---|
145
- | 1 | OLMoE-1B-7B (`OlmoeForCausalLM`) | `OlmoeForCausalLM` | Apache-2.0 | 7B/1.3B → 5B/1.0B | **Phone / 4 GB Q5** | ✅ **SHIPPED** as `olmoe-1b-7b-compacted-5b`. Second cross-arch validation of §4.1.3.4. |
146
- | 2 | [ibm-granite/granite-3.1-3b-a800m-instruct](https://huggingface.co/ibm-granite/granite-3.1-3b-a800m-instruct) | `GraniteMoeForCausalLM` | Apache-2.0 | 3.3B/800M (40e/top-8) | Edge tier | **Downloading now.** IBM enterprise brand, ultra-rare tiny-MoE niche, zero pruned variants. |
147
- | 3 | [deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat) | `DeepseekV2ForCausalLM` | DeepSeek (commercial OK) | 15.7B/2.4B | Single GPU | **Downloading now.** The forgotten DeepSeek sibling — DeepSeek brand without 670 GB of VRAM. |
148
- | 4 | [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) | `PhiMoEForCausalLM` | **MIT** | 42B/6.6B (16e/top-2) | Single 5090 Q4 | Queued. MIT-licensed Microsoft MoE that nobody runs because 42B is the awkward middle tier — until you prune to 12 experts. |
149
- | 5 | [mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) | `MixtralForCausalLM` | Apache-2.0 | 141B/39B (8e/top-2) | Single 5090 Q4 | Queued. Two-year overdue Pareto win — the textbook MoE that nobody has ever calibration-pruned. |
150
- | 6 | [Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) | `Qwen3MoeForCausalLM` | Apache-2.0 | 235B/22B (128e/top-8) | Single 5090 Q4 | Queued. Same family as our shipped 30B-A3B → methodology ports trivially. |
151
- | 7 | [Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct) | `Qwen3MoeForCausalLM` | Apache-2.0 | 480B/35B (160e/top-8) | **Grid moonshot** (4×24GB) | Queued. First consumer-accessible 480B coder. |
152
- | 8 | [deepseek-ai/DeepSeek-Coder-V2-Instruct](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct) | `DeepseekV2ForCausalLM` | DeepSeek | 236B/21B | Grid | Queued. Direct methodology replay at higher tier. |
153
- | 9 | [Snowflake/snowflake-arctic-instruct](https://huggingface.co/Snowflake/snowflake-arctic-instruct) | `ArcticForCausalLM` | Apache-2.0 | 480B/17B (128e/top-2) | Grid | Queued. The forgotten Apache frontier MoE — dense+sparse hybrid arch is a novel research contribution by itself. |
154
- | 10 | [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) | `DeepseekV3ForCausalLM` | **MIT** | 671B/37B | **Grid moonshot** | Queued. The viral king. First non-distill R1 compaction. |
155
-
156
- **8 distinct architecture classes** covered across **5 hardware tiers** (edge → phone → single GPU → 5090 → grid). When the queue completes, the calibration-aware-importance metric has been validated on `Qwen3MoeForCausalLM`, `OlmoeForCausalLM`, `GraniteMoeForCausalLM`, `DeepseekV2ForCausalLM`, `PhiMoEForCausalLM`, `MixtralForCausalLM`, `ArcticForCausalLM`, and `DeepseekV3ForCausalLM` — the cross-family invariance claim becomes empirical, not theoretical.
157
-
158
- ### Hard prerequisites being built in parallel
159
-
160
- - **LiveCodeBench v6 anchor extension** for `eval_with_calibration.py` — HumanEval is no longer reported on frontier model cards (Qwen3-Coder, DeepSeek-V3.1, Mixtral 8x22B all use SWE-bench / LiveCodeBench / Aider-Polyglot). Without LCB v6 wired up, frontier targets are blocked at the §4.1.4.1 calibration discipline gate. ~1-2 days of mechanical pipeline work.
161
- - **Offline teacher-logit precomputation** for `compensation_lora.py` — at 30B+ class, transformers' `caching_allocator_warmup` pre-allocates an fp16 buffer equal to full model size before bnb 4-bit takes effect, exceeding total VRAM on a single 32 GB GPU. The architecturally correct fix is phase-1-load-teacher / phase-2-unload / phase-3-load-student-and-train-against-on-disk-logits. Prerequisite for compensation v2 of every artifact ≥30B.
162
- - **Grid expert sharding** for the 480B+ moonshots — `cpu_expert_prune_v2.py`'s streaming pruner already handles shards bigger than any single GPU, but distributed inference + cross-machine activation profiling for the calibration-aware metric needs the grid layer. This is the §4.1.3.5 distributed forge methodology paper section.
163
-
164
- ### Sensory bridge stack (separate from the LLM forge queue)
165
-
166
- For Continuum's own sensory architecture (vision/audio/embedding bridges), the right targets are not forge candidates — they're curated bridge components used as-is:
167
-
168
- | Component | Model | Use |
169
- |---|---|---|
170
- | Vision encoder | [`google/siglip-so400m-patch14-384`](https://huggingface.co/google/siglip-so400m-patch14-384) | Image embeddings for the vision bridge |
171
- | Vision describer | [`microsoft/Phi-3.5-vision-instruct`](https://huggingface.co/microsoft/Phi-3.5-vision-instruct) | Small VLM that generates text descriptions consumed by text-only LLMs |
172
- | STT | [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3) | Speech transcription for audio bridge |
173
- | Multilingual embedding | [`BAAI/bge-m3`](https://huggingface.co/BAAI/bge-m3) | Sensory cache embeddings |
174
- | Avatar diffusion | [`black-forest-labs/FLUX.1-schnell`](https://huggingface.co/black-forest-labs/FLUX.1-schnell) | Apache-licensed avatar generation for Continuum universes |
175
-
176
- ### What we DON'T target
177
-
178
- The Llama 3.3 70B slot is saturated (six publishers, every quant level). We're not shipping a third compacted MoE in the middle tier. The lab's brand pitch is **models that no individual hardware tier can run, made runnable by structural compaction + grid distribution** — empty-quadrant headlines, not catalog filler. That's the intersection only continuum has, and the forge queue above is the map.
179
-
180
-
181
  ## License
182
 
183
  apache-2.0
 
1
  ---
2
  tags:
 
 
 
3
  - 7b
4
+ - agentic-coding
5
+ - android
6
+ - apple-silicon
7
+ - attested
8
+ - bash
9
+ - c
10
+ - chain-of-custody
11
+ - chinese
12
+ - code
13
+ - code-completion
14
+ - code-generation
15
+ - code-infill
16
+ - compacted
17
  - compensation-lora
18
+ - consumer-gpu
19
+ - cpp
20
+ - cryptographically-verified
21
+ - css
22
  - distillation
23
+ - edge-inference
24
+ - efficient
25
+ - embedded
26
+ - english
27
  - forge-alloy
28
+ - function-calling
29
+ - general
30
+ - general-purpose
31
+ - go
32
+ - head-pruning
33
+ - html
34
+ - iphone
35
+ - java
36
+ - javascript
37
+ - knowledge-distillation
38
+ - kotlin
39
+ - llama-cpp
40
+ - lm-studio
41
+ - local-inference
42
+ - lora
43
+ - macbook
44
+ - mlx
45
+ - mobile
46
+ - multilingual
47
+ - ollama
48
+ - on-device
49
+ - optimized
50
+ - php
51
+ - pruned
52
+ - python
53
+ - qwen
54
+ - qwen-coder
55
+ - qwen2
56
+ - qwen2.5
57
+ - qwen2.5-coder
58
+ - raspberry-pi
59
+ - reproducible
60
+ - ruby
61
+ - rust
62
+ - sql
63
+ - swift
64
+ - teacher-student
65
+ - text-generation
66
+ - typescript
67
+ - validation-artifact
68
+ - versatile
69
  base_model: Qwen/Qwen2.5-Coder-7B
70
  pipeline_tag: text-generation
71
  license: apache-2.0
 
73
 
74
  # 12% Pruned, 61.0 HUMANEVAL (base 62.2)
75
 
76
+ **Qwen2.5-Coder-7B** recovered to within calibration tolerance of the unmodified base via KL-distillation compensation LoRA.
77
 
78
  - **HUMANEVAL**: 61.0 (base 62.2, Δ -1.2)
79
  - **HUMANEVAL+PLUS**: 53.0 (base 53.7, Δ -0.7)
80
 
81
 
82
  <p align="center">
83
+ <a href="https://cambriantech.github.io/forge-alloy/verify/#4fe422e9b01fa8f0">
84
  <img src="alloy-qr.png" alt="Verify Chain of Custody" width="160"/>
85
  </a>
86
  </p>
87
 
88
  <p align="center">
89
+ <a href="https://cambriantech.github.io/forge-alloy/verify/#4fe422e9b01fa8f0"><b>Every claim on this card is verified</b></a><br>
90
  <b>Trust: self-attested</b> · 2 benchmarks · 1 device tested<br>
91
  <a href="https://github.com/CambrianTech/forge-alloy">ForgeAlloy</a> chain of custody · <a href="v2-7b-coder-compensated.alloy.json">Download alloy</a> · Merkle-chained
92
  </p>
 
151
 
152
  ## Chain of Custody
153
 
154
+ Scan the QR or [verify online](https://cambriantech.github.io/forge-alloy/verify/#4fe422e9b01fa8f0). Download the [alloy file](v2-7b-coder-compensated.alloy.json) to verify independently.
155
 
156
  | What | Proof |
157
  |------|-------|
158
+ | Model weights | `sha256:156247b9f9b25d302651e2540f1dad58d...` |
159
  | Forged on | NVIDIA GeForce RTX 5090, ? |
160
  | Published | [huggingface](https://huggingface.co/continuum-ai/v2-7b-coder-compensated) — 2026-04-08T05:02:57.072577+00:00 |
161
  | Trust level | [`self-attested`](https://github.com/CambrianTech/forge-alloy/blob/main/docs/ATTESTATION.md) |
 
173
 
174
  [GitHub](https://github.com/CambrianTech/continuum) · [All Models](https://huggingface.co/continuum-ai) · [Forge-Alloy](https://github.com/CambrianTech/forge-alloy)
175
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
  ## License
177
 
178
  apache-2.0