🔥 +22.7% better at code
Qwen3.5-4B forged for code generation through Experiential Plasticity
3.04 → 2.35 perplexity · 3 cycles · RTX 5090 · 45 minutes
Every claim on this card is verified
ForgeAlloy chain of custody · Download alloy · Merkle-chained · Self-attested
Runs On
| Device | Format | Size | Status |
|---|---|---|---|
| iPhone / Android | Q4_K_M | 2.6GB | GGUF available |
| MacBook Air 8GB | Q4_K_M | 2.6GB | GGUF available |
| MacBook Air 16GB | Q8_0 | 4.2GB | GGUF available |
| MacBook Pro 32GB | fp16 | 8.0GB | Native |
| RTX 3090/4090 | fp16 | 8.0GB | Native |
| RTX 5090 | fp16 | 8.0GB | Forged here |
Benchmarks
HumanEval evaluation in progress. Prior forge: 74.4% (63/85) on partial run. Full results will be added with proof via ForgeAlloy.
| Metric | Baseline | Forged | Change |
|---|---|---|---|
| Perplexity (code) | 3.04 | 2.35 | +22.7% |
| Parameters | 4.1B | 4.1B | — |
| Domain | general | code | specialized |
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("continuum-ai/qwen3.5-4b-code-forged",
torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("continuum-ai/qwen3.5-4b-code-forged")
inputs = tokenizer("def merge_sort(arr):", return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Forge Your Own
git clone https://github.com/CambrianTech/sentinel-ai && cd sentinel-ai && ./setup.sh
source .venv/bin/activate
python scripts/forge_model.py Qwen/Qwen3.5-4B --domain code
Or use the ForgeAlloy recipe — portable, typed, verifiable:
python scripts/alloy_executor.py qwen3.5-4b-code-forged.alloy.json
Chain of Custody
Every claim above is backed by the alloy file. Scan the QR or click to verify.
| What | Proof |
|---|---|
| Model weights unchanged | sha256:f6b777... model hash in alloy |
| Code that ran | sha256:464680... → alloy_executor.py |
| Forged on | RTX 5090, fp16, 2026-03-31 |
| Published to | This repo, receipted |
| Trust level | self-attested → what this means |
| Spec | ForgeAlloy — Rust/Python/TypeScript SDK |
The Science
Experiential Plasticity — not compression, architectural optimization:
- Train on code data (LoRA + AMP)
- Measure each attention head's contribution (entropy)
- Prune heads that don't contribute
- Retrain — surviving heads specialize
- Repeat — each cycle improves
Scaling law: improvement increases with model size. Domain-specific training (code) amplifies the effect.
| Model | Domain | Improvement |
|---|---|---|
| Qwen2.5-0.5B | general | -3.2% |
| Qwen2.5-7B | general | +11.8% |
| Qwen3.5-4B | code | +22.7% |
| Qwen3.5-27B | code | +3.5% |
Transfer function: 1.45 × exp(-0.18 × cycle) - 0.03
Papers
- Experiential Plasticity — scaling law, transfer function, self-directed controller
- Neural Plasticity in Transformers — foundation paper
- Plasticity Compaction — MoE expert pruning
sentinel-ai · continuum · forge-alloy · all models
Forged with ForgeAlloy — every claim verified by cryptographic chain of custody
- Downloads last month
- 2,559