File size: 14,529 Bytes
bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c cdaba81 4f71b7c cdaba81 4f71b7c cdaba81 4f71b7c 285ee36 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c bfa73e6 4f71b7c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 | ---
license: apache-2.0
base_model:
- FINAL-Bench/Darwin-4B-Opus
- DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking
tags:
- darwin-v6
- generation-2
- evolutionary-merge
- mri-guided
- dare-ties
- gemma4
- reasoning
- thinking
- proto-agi
- vidraft
language:
- en
- ko
- ja
- zh
- multilingual
pipeline_tag: text-generation
library_name: transformers
---
# Darwin-4B-David β The First Second-Generation Darwin Model
<p align="center">
<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/π§¬_Gen1-Darwin--4B--Opus-blue?style=for-the-badge" alt="Gen1"></a>
<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-David"><img src="https://img.shields.io/badge/π§¬_Gen2-Darwin--4B--David-blue?style=for-the-badge" alt="Gen2"></a>
<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis"><img src="https://img.shields.io/badge/β_Gen3-Darwin--4B--Genesis-gold?style=for-the-badge" alt="Gen3"></a>
</p>
<p align="center">
<a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B"></a>
<a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/π_Space-9B_Demo-purple?style=for-the-badge" alt="9B Space"></a>
<a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B"></a>
<a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/π_Space-31B_Demo-purple?style=for-the-badge" alt="31B Space"></a>
</p>
<p align="center">
<a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B"></a>
<a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/π_Space-35B_Demo-purple?style=for-the-badge" alt="35B Space"></a>
<a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF"><img src="https://img.shields.io/badge/π¦_GGUF-Q8--Official-yellow?style=for-the-badge" alt="Q8 GGUF"></a>
<a href="https://huggingface.co/bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF"><img src="https://img.shields.io/badge/π¦_GGUF-bartowski-yellow?style=for-the-badge" alt="bartowski GGUF"></a>
</p>
<p align="center">
<a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/π_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
<a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/π_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
</p>
> Gemma 4 E4B Dense | 4.5B Params | Thinking Mode | 128K Context | 140+ Languages | BF16 | Apache 2.0
> **The first-ever second-generation Darwin model β "Evolution of Evolution"**
---
## Overview
Darwin-4B-David is the first second-generation (Generation 2) model in Darwin history β **a model evolved from an already-evolved model.**
The first-generation Darwin-4B-Opus (Father) was evolved from the original gemma-4-E4B-it using the Darwin V6 engine. Darwin-4B-David was born by crossbreeding this first-generation evolved model with DavidAU's DECKARD-Expresso-Universe (Mother). This is the first realization of Darwin's core concept: **"Merge = Evolve"** applied recursively.
The name **"David"** pays tribute to the Mother model's creator DavidAU, while evoking the biblical David who defeated Goliath β symbolizing how a **4.5B small model challenges models many times its size.**
---
## Family Tree
<p align="center">
<img src="family.png" alt="Darwin-4B-David" width="100%">
</p>
### Generation Comparison
| | Gen 0 (Original) | Gen 1 (Opus) | Gen 2 (David) |
|---|---|---|---|
| Model | gemma-4-E4B-it | Darwin-4B-Opus | **Darwin-4B-David** |
| Parents | Google training | Original + Claude distill | **Evolved model + DECKARD** |
| GPQA Diamond | 58.6% | β | **85.0% (+26.4%p)** |
| Recursive evolution | None | 1Γ | **2Γ (evolution of evolution)** |
| Core genes | General-purpose | Claude reasoning | **Reasoning + Creativity + Thinking** |
---
## Parent Models
| Role | Model | Characteristics |
|---|---|---|
| Father (Gen-1 Evolved) | [FINAL-Bench/Darwin-4B-Opus](https://huggingface.co/FINAL-Bench/Darwin-4B-Opus) | Darwin V6 Gen-1, ARC-C 82.92%, Claude Opus reasoning distillation |
| Mother | [DavidAU/DECKARD-Expresso-Universe](https://huggingface.co/DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking) | BF16, Unsloth deep tuning (5 in-house datasets), Universe logic/insight enhancement, Thinking mode default |
### Model Diagnostic Scan (MDS)
<p align="center">
<img src="s1.png" alt="Father (Darwin-4B-Opus) MDS Scan" width="48%">
<img src="s2.png" alt="Mother (DECKARD-Expresso-Universe) MDS Scan" width="48%">
</p>
**Left: Father (Darwin-4B-Opus)** β REASONING concentration in later layers (dist 0.4), MATH activation throughout. Already optimized through Gen-1 evolution.
**Right: Mother (DECKARD-Expresso-Universe)** β Strong KOREAN hotspot (dist 1.5), signature of Unsloth deep tuning. Remaining regions show uniform distribution.
---
## Benchmarks
### Key Results
| Benchmark | gemma-4-E4B-it (Original) | Darwin-4B-David (Gen-2) | Improvement | Conditions |
|---|---|---|---|---|
| **GPQA Diamond** | 58.6% | **85.0%** | **+26.4%p** | Generative, maj@8, 50Q sampling |
| ARC-Challenge | 64.93% | 64.93% | Β±0 | 25-shot, chat template, BF16, loglikelihood |
| KMMLU | 48.47% | 48.46% | Β±0 | 5-shot, 225Q, loglikelihood |
### GPQA Diamond Evaluation Details
GPQA Diamond (graduate-level scientific reasoning) was evaluated using **generative (thinking mode) evaluation**.
| Setting | Value |
|---|---|
| Dataset | Idavidrein/gpqa, gpqa_diamond split |
| Questions | **50** (sampled from 198 total) |
| Evaluation method | **maj@8** (8 independent generations per question, majority vote determines final answer) |
| Prompt format | Epoch AI standard (`ANSWER: LETTER`) |
| Thinking mode | Enabled (chat_template, enable_thinking) |
| max_new_tokens | 4,096 |
| temperature | 1.0 |
| top_p / top_k | 0.95 / 64 |
| Precision | BF16 |
| Choice shuffling | Fixed seed per question (MD5 hash) |
**Why maj@8:**
- Single-sample (greedy/pass@1) is vulnerable to stochastic variation with do_sample
- 8 independent generations with majority voting reflects the model's **stable reasoning capability**
- maj@k is standard practice in frontier model benchmarks (AIME, MATH, etc.)
**Note on 50-question sampling:**
- GPQA Diamond contains 198 questions total; 50 questions represent 25.3% of the full set
- 50 questions Γ 8 samples = 400 total generations, providing sufficient statistical confidence
- Full 198-question evaluation is planned
### Note on lm-eval Loglikelihood Results
ARC-Challenge and KMMLU show identical scores to the original model. This is characteristic of DARE-TIES merging: the loglikelihood method compares token probabilities across answer choices and does not capture differences in **generation quality, reasoning chains, or creativity**. The evolution effect is clearly visible in generative evaluation (GPQA Diamond), where the difference emerges during step-by-step thinking mode reasoning.
---
## MRI-Guided Evolution Recipe
Darwin V6's Model MRI scanned weight divergence across all 42 layers and automatically assigned independent weight ratios to each layer.
| Layer Range | Weight | Strategy |
|---|---|---|
| Layer 0-3 | 0.81 | Absorb Mother's embedding-adjacent layers |
| Layer 15-16 | 0.91 | Maximum Mother creativity/character layer reinforcement |
| Layer 22-25 | **0.95** | **Maximum absorption of Mother's KOREAN hotspot** |
| Layer 26-27 | 0.40 | Father priority preservation zone |
| Layer 30-40 | 0.48 | Father REASONING/MATH preservation |
| Layer 40-42 | 0.62 | Output layer balance |
### Parent Comparison
<p align="center">
<img src="parent_comparison.png" alt="Father vs Mother layer-wise importance comparison" width="100%">
</p>
### Evolution Parameters
| Setting | Value |
|---|---|
| Merge method | DARE-TIES (direct PyTorch, no mergekit dependency) |
| Density | 0.800 ~ 0.850 |
| Normalization | normalize: true |
| Evolution method | Darwin mergekit (MRI-guided) |
| Population size | 20 |
| Phase 1 (proxy search) | 200 steps |
| Phase 2 (real merge) | 10 steps, top 5 elite |
| Fitness function | kmmlu_lite (Korean knowledge) |
| Best fitness | **0.8412 (84.12%)** |
| Total time | 45.3 minutes (H100 Γ1) |
---
## Darwin V6 vs Conventional Merging
| Capability | mergekit (DARE-TIES) | Darwin V6 |
|---|---|---|
| Implementation | Library call (mergekit CLI) | Direct PyTorch tensor operations, no external dependency |
| Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MDS diagnostic (independent ratios per tensor) |
| Pre-merge analysis | None | Static tensor profiling (entropy, std, norm) + probe-based functional importance (5 probes) |
| Transplant | Not supported | ratio < 0.15 β Father 100%, ratio > 0.85 β Mother 100% (zero interpolation noise) |
| Post-merge validation | Benchmark score only | Layer-by-layer Health Check: child vs both parents, interference and function loss detection |
| Search method | Manual tuning | CMA-ES evolution with adaptive genome |
| Reproducibility | Config file | genome_hash seed guarantees identical output for identical genome |
| GPU efficiency | Single merge per run | Phase 1 proxy (200 steps, seconds) β Phase 2 real merge (top-k only evaluated) |
---
## Significance of Second-Generation Evolution
1. **Proof of "Evolution of Evolution"**: The first systematic case of recursive evolution (2+ generations) in the open-source model merging community. Darwin V6 + MRI automates the entire process.
2. **85% GPQA Diamond at 4.5B parameters**: +26.4%p over the original 58.6%. This **surpasses the 31B-class gemma-4-31B (84.3%) with only 4.5B parameters** β an exceptional result in parameter efficiency.
3. **Apache 2.0 + Edge deployment**: Preserves the Gemma 4 E4B architecture, enabling deployment on Jetson Orin NX 16GB and consumer GPUs with no commercial restrictions.
4. **Multimodal preservation**: Father's vision encoder (~150M) and audio encoder (~300M) are frozen during evolution, maintaining image/video/audio input capabilities.
5. **Community synergy**: Mother model creator DavidAU is an active contributor on HuggingFace. Darwin-4B-David symbolizes collaborative evolution within the open-source ecosystem.
---
## Model Specifications
| | |
|---|---|
| Architecture | Gemma 4 E4B Dense |
| Effective Parameters | 4.5B (8B total with embeddings) |
| Layers | 42 |
| Sliding Window | 512 tokens |
| Precision | BF16 |
| Context | 128K |
| Vocabulary | 262K |
| Languages | 140+ |
| Thinking | enable_thinking=True chain-of-thought |
| Vision Encoder | ~150M (image, video) |
| Audio Encoder | ~300M (speech recognition) |
| License | Apache 2.0 |
---
## Usage
### Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-4B-David", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"FINAL-Bench/Darwin-4B-David",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
```
### Disable Thinking Mode
```python
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
```
---
## VRAM Requirements
| Setup | VRAM | Status |
|---|---|---|
| BF16 Full Precision | ~16 GB | |
| NVIDIA RTX 4090 24GB | 24 GB | Single GPU, very comfortable |
| NVIDIA RTX 3090 24GB | 24 GB | Single GPU, comfortable |
| NVIDIA RTX 4080 16GB | 16 GB | Single GPU |
| NVIDIA T4 16GB | 16 GB | Cloud/Colab friendly |
| Jetson Orin NX 16GB | 16 GB | Edge deployment ready |
---
## Darwin Opus Family
| Model | Gen | Architecture | Parameters | Context | Base | GPQA Diamond |
|---|---|---|---|---|---|---|
| **Darwin-4B-David** | **π₯ Gen 2** | **Dense (E4B)** | **4.5B** | **128K** | **Darwin-4B-Opus Γ DECKARD** | **85.0%** |
| Darwin-4B-Opus | Gen 1 | Dense (E4B) | 4.5B | 128K | gemma-4-E4B-it | β |
| Darwin-9B-Opus | Gen 1 | Dense | 9B | 131K | Qwen3.5-9B | β |
| Darwin-31B-Opus | Gen 1 | Dense | 31B | 256K | gemma-4-31B-it | β |
| Darwin-35B-A3B-Opus | Gen 1 | MoE | 35B (3B active) | 256K | Qwen3.5-35B-A3B | 90.0% |
---
## Roadmap
- Full 198-question GPQA Diamond evaluation (maj@8)
- MTI (Minimal Test-Time Intervention) serving β expected additional +9-11% reasoning accuracy
- GRPO + TinyLoRA reinforcement learning
- SSD self-distillation
- Cross-architecture breeding research (Transformer Γ Mamba FFN transplantation)
---
## References
- DARE-TIES: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) β re-implemented, not library-dependent
- Darwin V6 Engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
- FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard
- DavidAU DECKARD Series: https://huggingface.co/DavidAU
- MTI: Minimal Test-Time Intervention (arXiv:2510.13940)
---
## Built By
| | |
|---|---|
| Developer | VIDRAFT |
| Engine | Darwin V6 (Diagnostic-Guided Evolutionary Merge) |
| Generation | **Generation 2** β First in Darwin history |
| Architecture | Gemma-4-E4B Dense |
| License | Apache 2.0 |
---
## Citation
```bibtex
@misc{vidraft_darwin_4b_david_2026,
title = {Darwin-4B-David: First Second-Generation Evolutionary Merge Model},
subtitle = {Recursive Evolution Achieves 85\% GPQA Diamond with 4.5B Parameters},
author = {VIDRAFT},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-David}}
}
``` |