Darwin-4B-Genesis

Gen1 Gen2 Gen3

9B 9B Space 31B 31B Space

35B 35B Space Q8 GGUF bartowski GGUF

FINAL Bench ALL Bench

World's first Transformer ร— Mamba evolutionary cross-architecture FFN breeding | CLIcK 92% | MuSR 70% | A 4B model outperforming 27B | CMA-ES 42-dimensional genome search | Hybrid Vigor demonstrated | Apache 2.0


What Is This?

Darwin-4B-Genesis is the 3rd generation Darwin model and the world's first model to successfully crossbreed FFN layers across different architectures โ€” Transformer (Gemma4) and Mamba (Qwen3.5 GatedDeltaNet) โ€” using evolutionary optimization.

The father's Attention layers (Gemma4 Transformer) are preserved at 100%, while the mother's FFN knowledge (Qwen3.5 Mamba) is transplanted at layer-specific optimal ratios discovered automatically by CMA-ES across 42 dimensions.

The result: the child outperforms both parents on every benchmark โ€” a phenomenon known as Hybrid Vigor.


Darwin-4B-Genesis

Why This Matters

1. World First

Existing hybrid models (Jamba, Nemotron-H, Granite 4.0) are all designed and trained from scratch. Darwin-4B-Genesis takes two already-trained models from different architecture families and breeds them evolutionarily โ€” with zero additional training.

2. Hybrid Vigor Demonstrated

Benchmark David (Father) Qwen3.5-4B (Mother) Genesis (Child)
CLIcK 90% ~50% (est.) 92% โœ…
MuSR 65% ~55% (est.) 70% โœ…

The child surpasses both parents. This is the first demonstration of Hybrid Vigor in AI model breeding.

3. Manual vs Evolution

Method CLIcK MuSR
Manual 50% blend ~23% โ€”
Manual 30% selective blend 62% 45%
CMA-ES 42D automatic search 92% 70%

Human-chosen ratios fail. Evolutionary search succeeds.


Benchmarks

Benchmark Genesis David (Gen2) K-AI #1 (27B)
CLIcK (Korean culture) 92% 90% 0.794
MuSR (multi-step reasoning) 70% 65% 0.604
GPQA (deep reasoning) ~60% ~60% โ€”

A 4B model dominates the K-AI leaderboard's #1 model (27B) on both CLIcK and MuSR.


How It Works

Cross-Architecture FFN Breeding

Father: Darwin-4B-David (Gemma4 Transformer, hidden=2560, 42 layers)
Mother: Qwen/Qwen3.5-4B (GatedDeltaNet/Mamba, hidden=2560, 32 layers)

Key insight: hidden_size matches (2560) โ†’ direct FFN replacement possible
Method: Attention 100% from Father, FFN blended at per-layer optimal ratios
Optimizer: CMA-ES (Covariance Matrix Adaptation Evolution Strategy)
Genome: 42 dimensions (one ratio per layer)
Fitness: CLIcK 60% + MuSR 40% composite score
Frozen layers: L15, L16, L22, L23, L24, L25 (Korean language preservation)

Optimal Genome Discovered by CMA-ES

L00: 0.206  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  21% Qwen
L07: 0.000  โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  Auto-protected by CMA-ES
L15: 0.000  โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  Frozen (Korean)
L22: 0.000  โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  Frozen (Korean)
L29: 0.291  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  29% Qwen (maximum)
L31: 0.244  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  24% Qwen
L32: 0.273  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  27% Qwen

Key finding: CMA-ES applied the most aggressive Qwen blending to the final layers (L29-32), which govern output quality. The algorithm determined that "Qwen's generation quality exceeds Darwin's" for those specific layers โ€” while simultaneously protecting critical layers (L7, L18, L28) by driving their ratios to zero.

Training Cost

This Model Typical Hybrid
GPU H100 ร— 1 Hundreds to thousands
Time 155 minutes Weeks to months
Training data 0 tokens Trillions of tokens
Training compute Fitness evaluation only Full pre-training

Genealogy

google/gemma-4-E4B-it ร— TeichAI/Claude-Opus-Distill-E4B
    โ†’ Darwin-4B-Opus (Gen 1, DARE-TIES merge)

Darwin-4B-Opus ร— DavidAU/DECKARD-Expresso-Universe
    โ†’ Darwin-4B-David (Gen 2, MRI-guided merge, CLIcK 90%)

Darwin-4B-David ร— Qwen/Qwen3.5-4B
    โ†’ Darwin-4B-Genesis (Gen 3, Cross-Arch FFN Breeding, CLIcK 92%) โ˜…

DNA Composition

Gemma4 Transformer (skeleton, Attention)  ~50%
Claude Opus Distill (reasoning patterns)  ~20%
DECKARD Universe (Korean, creativity)     ~15%
Qwen3.5 GatedDeltaNet (Mamba FFN)         ~15%

What Is FFN Breeding?

AI models have two main components:

  • Attention = the brain (decides what to focus on, reasoning chains)
  • FFN = the muscles (stores knowledge, processes patterns)

Darwin-4B-Genesis keeps the brain from the father (Transformer) and blends in muscles from the mother (Mamba) at optimal ratios. As long as the FFN input/output dimensions match (hidden_size=2560), the swap works โ€” like a USB-C port that accepts any compatible charger.


Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained(
    "FINAL-Bench/Darwin-4B-Genesis",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-4B-Genesis",
    dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Explain how hybrid vigor works in genetics."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True))

Hardware Requirements

Setup VRAM Status
NVIDIA RTX 4090 (24GB) 24 GB BF16 fits
NVIDIA RTX 3090 (24GB) 24 GB BF16 fits
NVIDIA H100 (93GB) 93 GB Comfortable
Mac M3 Max (36GB) 36 GB Comfortable

Dense 4B model โ€” runs on a single consumer GPU.


Model Specifications

Architecture Gemma4 Dense (Transformer Attention + Mamba FFN hybrid)
Effective Parameters 4B (8B total with PLE)
Hidden Size 2560
Intermediate Size 10240
Layers 42
Context Length 32,768
License Apache 2.0

How This Differs from Prior Work

Existing Hybrids Darwin-4B-Genesis
Examples Jamba, Nemotron-H, Granite 4.0 This model
Method Design โ†’ train from scratch Breed trained models โ†’ zero training
Cost Thousands of GPUยทhours H100 ร— 1, 2.6 hours
Data Trillions of tokens 0 tokens (fitness eval only)
Ratio selection Manual architecture design CMA-ES 42D automatic search
Hybrid Vigor Not tested Benchmarked and confirmed

Future Work

  • Cross-breeding with RWKV-7, xLSTM, and other architectures
  • Scaling to 31B/35B models with the same technique
  • Paper: "Cross-Architecture FFN Breeding with Evolutionary Optimization"
  • Patents: Methods for selective FFN transplantation across architectures

Acknowledgements

  • Korean Government โ€” GPU Support Program research grant
  • Google โ€” Gemma4 E4B architecture
  • Alibaba Qwen Team โ€” Qwen3.5-4B GatedDeltaNet
  • TeichAI โ€” Claude Opus Distill model
  • DavidAU โ€” DECKARD-Expresso-Universe model
  • Jackrong โ€” Claude 4.6 Opus Reasoning Distilled

Citation

@misc{vidraft_darwin_4b_genesis,
  title        = {Darwin-4B-Genesis: World's First Cross-Architecture FFN Breeding},
  author       = {VIDRAFT},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis}}
}
Downloads last month
41
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for FINAL-Bench/Darwin-4B-Genesis

Collection including FINAL-Bench/Darwin-4B-Genesis

Evaluation results