Regenerate model card from corrected alloy (alloyHash 4fe422e9b01fa8f0)

6ab6114 verified about 20 hours ago

6.58 kB

	---
	tags:
	- 7b
	- agentic-coding
	- android
	- apple-silicon
	- attested
	- bash
	- c
	- chain-of-custody
	- chinese
	- code
	- code-completion
	- code-generation
	- code-infill
	- compacted
	- compensation-lora
	- consumer-gpu
	- cpp
	- cryptographically-verified
	- css
	- distillation
	- edge-inference
	- efficient
	- embedded
	- english
	- forge-alloy
	- function-calling
	- general
	- general-purpose
	- go
	- head-pruning
	- html
	- iphone
	- java
	- javascript
	- knowledge-distillation
	- kotlin
	- llama-cpp
	- lm-studio
	- local-inference
	- lora
	- macbook
	- mlx
	- mobile
	- multilingual
	- ollama
	- on-device
	- optimized
	- php
	- pruned
	- python
	- qwen
	- qwen-coder
	- qwen2
	- qwen2.5
	- qwen2.5-coder
	- raspberry-pi
	- reproducible
	- ruby
	- rust
	- sql
	- swift
	- teacher-student
	- text-generation
	- typescript
	- validation-artifact
	- versatile
	base_model: Qwen/Qwen2.5-Coder-7B
	pipeline_tag: text-generation
	license: apache-2.0
	---

	# 12% Pruned, 61.0 HUMANEVAL (base 62.2)

	Qwen2.5-Coder-7B recovered to within calibration tolerance of the unmodified base via KL-distillation compensation LoRA.

	- HUMANEVAL: 61.0 (base 62.2, Δ -1.2)
	- HUMANEVAL+PLUS: 53.0 (base 53.7, Δ -0.7)


	<p align="center">
	<a href="https://cambriantech.github.io/forge-alloy/verify/#4fe422e9b01fa8f0">
	<img src="alloy-qr.png" alt="Verify Chain of Custody" width="160"/>
	</a>
	</p>

	<p align="center">
	<a href="https://cambriantech.github.io/forge-alloy/verify/#4fe422e9b01fa8f0"><b>Every claim on this card is verified</b></a><br>
	<b>Trust: self-attested</b> · 2 benchmarks · 1 device tested<br>
	<a href="https://github.com/CambrianTech/forge-alloy">ForgeAlloy</a> chain of custody · <a href="v2-7b-coder-compensated.alloy.json">Download alloy</a> · Merkle-chained
	</p>

	---

	Qwen2.5-Coder-7B with cryptographic provenance via the [ForgeAlloy](https://github.com/CambrianTech/forge-alloy) chain of custody. Scores 61.0 humaneval against the unmodified base's 62.2, recovered to within calibration tolerance after head pruning + distillation. Ships with the per-problem evaluation outputs so the score is independently verifiable.


	## Benchmarks

	\| Benchmark \| Score \| Base \| Δ \| Verified \|
	\|---\|---\|---\|---\|---\|
	\| humaneval \| 61.0 \| 62.2 \| -1.2 \| ✅ Result hash \|
	\| humaneval_plus \| 53.0 \| 53.7 \| -0.7 \| ✅ Result hash \|


	## What Changed (Base → Forged)

	\| \| Base \| Forged \| Delta \|
	\|---\|---\|---\|---\|
	\| Pruning \| None \| 12% heads (activation-magnitude) \| -12% params ✅ \|
	\| compensation-lora \| None \| rank=16 \| q_proj, k_proj, v_proj, o_proj... \|
	\| Pipeline \| \| prune → lora → lora → eval \| 1 cycles \|

	## Runs On

	\| Device \| Format \| Size \| Speed \|
	\|--------\|--------\|------\|-------\|
	\| NVIDIA GeForce RTX 5090 \| fp16 \| — \| Verified \|
	\| MacBook Pro 32GB \| fp16 \| 8.0GB \| Expected \|
	\| MacBook Air 16GB \| Q8_0 \| ~4.0GB \| Expected \|
	\| MacBook Air 8GB \| Q4_K_M \| ~2.5GB \| Expected \|
	\| iPhone / Android \| Q4_K_M \| ~2.5GB \| Expected \|

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("continuum-ai/v2-7b-coder-compensated",
	torch_dtype="auto", device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("continuum-ai/v2-7b-coder-compensated")

	inputs = tokenizer("def merge_sort(arr):", return_tensors="pt").to(model.device)
	output = model.generate(**inputs, max_new_tokens=200)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```


	## Methodology

	Produced via head pruning, LoRA fine-tuning, KL-distillation compensation against the unmodified teacher. Full methodology, ablations, and per-stage rationale are in [the methodology paper](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION.md) and the companion [`MODEL_METHODOLOGY.md`](MODEL_METHODOLOGY.md) in this repository. The pipeline ran as `prune → lora → lora → eval` over 1 cycle on NVIDIA GeForce RTX 5090.

	## Limitations

	- This model is currently a methodology demonstration rather than a Pareto-optimal artifact at any specific hardware tier. For production code workloads on smaller hardware, the unmodified Qwen2.5-Coder-7B at standard quantization (Q4_K_M / Q5_K_M / Q8_0) may be a better fit pending the larger Qwen3.5+ forges that exercise the pruning dimension where this methodology actually wins.
	- Validated on HumanEval / HumanEval+ for English-language Python code completion. Performance on other programming languages, code paradigms (functional, embedded, kernel), or code-adjacent domains (SQL, regex, shell) has not been measured.
	- Ships as fp16 only. GGUF quantization tiers (Q5_K_S / Q3_K_M / Q2_K) are not yet published for this artifact; the per-tier comparison from the development log showed base+quant dominates v2+quant at every VRAM tier on the same 7B base, which is why the methodology validation here uses fp16 and the production GGUF publishes are reserved for the Qwen3.5+ forges where the dimension flips.
	- Vision modality not yet wired in. The Continuum sensory architecture treats vision as first-class for personas, but this 7B coder artifact is text-only.


	## Chain of Custody

	Scan the QR or [verify online](https://cambriantech.github.io/forge-alloy/verify/#4fe422e9b01fa8f0). Download the [alloy file](v2-7b-coder-compensated.alloy.json) to verify independently.

	\| What \| Proof \|
	\|------\|-------\|
	\| Model weights \| `sha256:156247b9f9b25d302651e2540f1dad58d...` \|
	\| Forged on \| NVIDIA GeForce RTX 5090, ? \|
	\| Published \| [huggingface](https://huggingface.co/continuum-ai/v2-7b-coder-compensated) — 2026-04-08T05:02:57.072577+00:00 \|
	\| Trust level \| [`self-attested`](https://github.com/CambrianTech/forge-alloy/blob/main/docs/ATTESTATION.md) \|
	\| Spec \| [ForgeAlloy](https://github.com/CambrianTech/forge-alloy) — Rust/Python/TypeScript \|

	## Make Your Own

	Forged with [Continuum](https://github.com/CambrianTech/continuum) — a distributed AI world that runs on your hardware.

	<p align="center">
	<a href="https://github.com/CambrianTech/continuum"><img src="https://raw.githubusercontent.com/CambrianTech/continuum/main/docs/images/factory.png" alt="Continuum Model Factory" width="400"/></a>
	</p>

	The Factory configurator lets you design and forge custom models visually — context extension, pruning, LoRA, quantization, vision/audio modalities. Pick your target devices, the system figures out what fits.

	[GitHub](https://github.com/CambrianTech/continuum) · [All Models](https://huggingface.co/continuum-ai) · [Forge-Alloy](https://github.com/CambrianTech/forge-alloy)

	## License

	apache-2.0