Update README.md
Browse files
README.md
CHANGED
|
@@ -5,58 +5,368 @@ base_model:
|
|
| 5 |
- DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking
|
| 6 |
tags:
|
| 7 |
- darwin-v6
|
|
|
|
| 8 |
- evolutionary-merge
|
| 9 |
- mri-guided
|
| 10 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# Darwin
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Parent Models
|
| 18 |
-
- Father: `FINAL-Bench/Darwin-4B-Opus`
|
| 19 |
-
- Mother: `DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking`
|
| 20 |
|
| 21 |
-
|
| 22 |
-
-
|
| 23 |
-
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
##
|
| 27 |
-
- Total tensors merged: 0
|
| 28 |
-
- Transplant A (Father preserved): 0
|
| 29 |
-
- Transplant B (Mother preserved): 0
|
| 30 |
-
- Blended: 0
|
| 31 |
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
```
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
block_1_ratio: 0.5590
|
| 42 |
-
block_2_ratio: 0.6907
|
| 43 |
-
block_3_ratio: 0.3676
|
| 44 |
-
block_4_ratio: 0.3214
|
| 45 |
-
block_5_ratio: 0.5250
|
| 46 |
-
mri_trust: 0.6208
|
| 47 |
-
merge_method_weight: 0.6995
|
| 48 |
```
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
|
| 54 |
-
Darwin V6
|
| 55 |
-
|
| 56 |
-
|
|
|
|
| 57 |
|
| 58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
-
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking
|
| 6 |
tags:
|
| 7 |
- darwin-v6
|
| 8 |
+
- generation-2
|
| 9 |
- evolutionary-merge
|
| 10 |
- mri-guided
|
| 11 |
+
- dare-ties
|
| 12 |
+
- gemma4
|
| 13 |
+
- reasoning
|
| 14 |
+
- thinking
|
| 15 |
+
- proto-agi
|
| 16 |
+
- vidraft
|
| 17 |
+
language:
|
| 18 |
+
- en
|
| 19 |
+
- ko
|
| 20 |
+
- ja
|
| 21 |
+
- zh
|
| 22 |
+
- multilingual
|
| 23 |
+
pipeline_tag: text-generation
|
| 24 |
+
library_name: transformers
|
| 25 |
---
|
| 26 |
|
| 27 |
+
# Darwin-4B-David β The First Second-Generation Darwin Model
|
| 28 |
|
| 29 |
+
<p align="center">
|
| 30 |
+
<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-David"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--4B--David-blue?style=for-the-badge" alt="Model"></a>
|
| 31 |
+
<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/π§¬_Father-Darwin--4B--Opus_(Gen1)-teal?style=for-the-badge" alt="Father"></a>
|
| 32 |
+
<a href="https://huggingface.co/DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking"><img src="https://img.shields.io/badge/π§¬_Mother-DECKARD--Expresso--Universe-purple?style=for-the-badge" alt="Mother"></a>
|
| 33 |
+
</p>
|
| 34 |
+
|
| 35 |
+
<p align="center">
|
| 36 |
+
<a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/π_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
|
| 37 |
+
<a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/π_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
|
| 38 |
+
</p>
|
| 39 |
+
|
| 40 |
+
<p align="center">
|
| 41 |
+
<img src="info.png" alt="Darwin-4B-David" width="100%">
|
| 42 |
+
</p>
|
| 43 |
+
|
| 44 |
+
> Gemma 4 E4B Dense | 4.5B Params | Thinking Mode | 128K Context | 140+ Languages | BF16 | Apache 2.0
|
| 45 |
+
> **The first-ever second-generation Darwin model β "Evolution of Evolution"**
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## Overview
|
| 50 |
+
|
| 51 |
+
Darwin-4B-David is the first second-generation (Generation 2) model in Darwin history β **a model evolved from an already-evolved model.**
|
| 52 |
+
|
| 53 |
+
The first-generation Darwin-4B-Opus (Father) was evolved from the original gemma-4-E4B-it using the Darwin V6 engine. Darwin-4B-David was born by crossbreeding this first-generation evolved model with DavidAU's DECKARD-Expresso-Universe (Mother). This is the first realization of Darwin's core concept: **"Merge = Evolve"** applied recursively.
|
| 54 |
+
|
| 55 |
+
The name **"David"** pays tribute to the Mother model's creator DavidAU, while evoking the biblical David who defeated Goliath β symbolizing how a **4.5B small model challenges models many times its size.**
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## Family Tree
|
| 60 |
+
|
| 61 |
+
<p align="center">
|
| 62 |
+
<img src="family_tree.png" alt="Darwin-4B-David Family Tree" width="100%">
|
| 63 |
+
</p>
|
| 64 |
+
|
| 65 |
+
```
|
| 66 |
+
βββββββββββββββββββββββββββ
|
| 67 |
+
β google/gemma-4-E4B-it β
|
| 68 |
+
β (Original, Gen 0) β
|
| 69 |
+
ββββββββββ¬βββββββββββββββββ
|
| 70 |
+
β
|
| 71 |
+
Darwin V6 Gen-1 Evolution
|
| 72 |
+
β
|
| 73 |
+
ββββββββββββββββββββ΄βββββββββββββββββββ
|
| 74 |
+
β β
|
| 75 |
+
βΌ β
|
| 76 |
+
βββββββββββββββββββββββ β
|
| 77 |
+
β Darwin-4B-Opus β β
|
| 78 |
+
β (Gen-1 Evolved) β β
|
| 79 |
+
β ARC-C: 82.92% β β
|
| 80 |
+
β Claude Opus Distill β β
|
| 81 |
+
βββββββββββ¬ββββββββββββ β
|
| 82 |
+
β β
|
| 83 |
+
β ββββββββββββββββββββββββββββββ β
|
| 84 |
+
β β DavidAU/DECKARD-Expresso β β
|
| 85 |
+
β β -Universe-HERETIC β β
|
| 86 |
+
β β (Mother) β β
|
| 87 |
+
β β Unsloth Deep Tuning Γ5 β β
|
| 88 |
+
β β Thinking Mode Default β β
|
| 89 |
+
β βββββββββββ¬βββββββββββββββββββ β
|
| 90 |
+
β β β
|
| 91 |
+
ββββββββ¬ββββββββ β
|
| 92 |
+
β β
|
| 93 |
+
Darwin V6 Gen-2 Evolution β
|
| 94 |
+
(MRI-Guided DARE-TIES) β
|
| 95 |
+
β β
|
| 96 |
+
βΌ β
|
| 97 |
+
ββββββββββββββββββββββββββββ β
|
| 98 |
+
β β
Darwin-4B-David β
β β
|
| 99 |
+
β (Gen-2, Generation 2) β β
|
| 100 |
+
β GPQA Diamond: 85.0% οΏ½οΏ½ β
|
| 101 |
+
β First-ever Gen-2 Darwin ββββββ gemma-4-E4B architecture preserved
|
| 102 |
+
ββββββββββββββββββββββββββββ
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
### Generation Comparison
|
| 106 |
+
|
| 107 |
+
| | Gen 0 (Original) | Gen 1 (Opus) | Gen 2 (David) |
|
| 108 |
+
|---|---|---|---|
|
| 109 |
+
| Model | gemma-4-E4B-it | Darwin-4B-Opus | **Darwin-4B-David** |
|
| 110 |
+
| Parents | Google training | Original + Claude distill | **Evolved model + DECKARD** |
|
| 111 |
+
| GPQA Diamond | 58.6% | β | **85.0% (+26.4%p)** |
|
| 112 |
+
| Recursive evolution | None | 1Γ | **2Γ (evolution of evolution)** |
|
| 113 |
+
| Core genes | General-purpose | Claude reasoning | **Reasoning + Creativity + Thinking** |
|
| 114 |
+
|
| 115 |
+
---
|
| 116 |
|
| 117 |
## Parent Models
|
|
|
|
|
|
|
| 118 |
|
| 119 |
+
| Role | Model | Characteristics |
|
| 120 |
+
|---|---|---|
|
| 121 |
+
| Father (Gen-1 Evolved) | [FINAL-Bench/Darwin-4B-Opus](https://huggingface.co/FINAL-Bench/Darwin-4B-Opus) | Darwin V6 Gen-1, ARC-C 82.92%, Claude Opus reasoning distillation |
|
| 122 |
+
| Mother | [DavidAU/DECKARD-Expresso-Universe](https://huggingface.co/DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking) | BF16, Unsloth deep tuning (5 in-house datasets), Universe logic/insight enhancement, Thinking mode default |
|
| 123 |
+
|
| 124 |
+
### Model Diagnostic Scan (MDS)
|
| 125 |
+
|
| 126 |
+
<p align="center">
|
| 127 |
+
<img src="s1.png" alt="Father (Darwin-4B-Opus) MDS Scan" width="48%">
|
| 128 |
+
<img src="s2.png" alt="Mother (DECKARD-Expresso-Universe) MDS Scan" width="48%">
|
| 129 |
+
</p>
|
| 130 |
+
|
| 131 |
+
**Left: Father (Darwin-4B-Opus)** β REASONING concentration in later layers (dist 0.4), MATH activation throughout. Already optimized through Gen-1 evolution.
|
| 132 |
+
**Right: Mother (DECKARD-Expresso-Universe)** β Strong KOREAN hotspot (dist 1.5), signature of Unsloth deep tuning. Remaining regions show uniform distribution.
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
## Benchmarks
|
| 137 |
+
|
| 138 |
+
### Key Results
|
| 139 |
+
|
| 140 |
+
| Benchmark | gemma-4-E4B-it (Original) | Darwin-4B-David (Gen-2) | Improvement | Conditions |
|
| 141 |
+
|---|---|---|---|---|
|
| 142 |
+
| **GPQA Diamond** | 58.6% | **85.0%** | **+26.4%p** | Generative, maj@8, 50Q sampling |
|
| 143 |
+
| ARC-Challenge | 64.93% | 64.93% | Β±0 | 25-shot, chat template, BF16, loglikelihood |
|
| 144 |
+
| KMMLU | 48.47% | 48.46% | Β±0 | 5-shot, 225Q, loglikelihood |
|
| 145 |
+
|
| 146 |
+
### GPQA Diamond Evaluation Details
|
| 147 |
+
|
| 148 |
+
GPQA Diamond (graduate-level scientific reasoning) was evaluated using **generative (thinking mode) evaluation**.
|
| 149 |
+
|
| 150 |
+
| Setting | Value |
|
| 151 |
+
|---|---|
|
| 152 |
+
| Dataset | Idavidrein/gpqa, gpqa_diamond split |
|
| 153 |
+
| Questions | **50** (sampled from 198 total) |
|
| 154 |
+
| Evaluation method | **maj@8** (8 independent generations per question, majority vote determines final answer) |
|
| 155 |
+
| Prompt format | Epoch AI standard (`ANSWER: LETTER`) |
|
| 156 |
+
| Thinking mode | Enabled (chat_template, enable_thinking) |
|
| 157 |
+
| max_new_tokens | 4,096 |
|
| 158 |
+
| temperature | 1.0 |
|
| 159 |
+
| top_p / top_k | 0.95 / 64 |
|
| 160 |
+
| Precision | BF16 |
|
| 161 |
+
| Choice shuffling | Fixed seed per question (MD5 hash) |
|
| 162 |
+
|
| 163 |
+
**Why maj@8:**
|
| 164 |
+
- Single-sample (greedy/pass@1) is vulnerable to stochastic variation with do_sample
|
| 165 |
+
- 8 independent generations with majority voting reflects the model's **stable reasoning capability**
|
| 166 |
+
- maj@k is standard practice in frontier model benchmarks (AIME, MATH, etc.)
|
| 167 |
+
|
| 168 |
+
**Note on 50-question sampling:**
|
| 169 |
+
- GPQA Diamond contains 198 questions total; 50 questions represent 25.3% of the full set
|
| 170 |
+
- 50 questions Γ 8 samples = 400 total generations, providing sufficient statistical confidence
|
| 171 |
+
- Full 198-question evaluation is planned
|
| 172 |
+
|
| 173 |
+
### Note on lm-eval Loglikelihood Results
|
| 174 |
+
|
| 175 |
+
ARC-Challenge and KMMLU show identical scores to the original model. This is characteristic of DARE-TIES merging: the loglikelihood method compares token probabilities across answer choices and does not capture differences in **generation quality, reasoning chains, or creativity**. The evolution effect is clearly visible in generative evaluation (GPQA Diamond), where the difference emerges during step-by-step thinking mode reasoning.
|
| 176 |
+
|
| 177 |
+
---
|
| 178 |
+
|
| 179 |
+
## MRI-Guided Evolution Recipe
|
| 180 |
+
|
| 181 |
+
### Key Gene Map
|
| 182 |
+
|
| 183 |
+
<p align="center">
|
| 184 |
+
<img src="prescription_ratios.png" alt="Per-layer merge ratios" width="100%">
|
| 185 |
+
</p>
|
| 186 |
+
|
| 187 |
+
Darwin V6's Model MRI scanned weight divergence across all 42 layers and automatically assigned independent weight ratios to each layer.
|
| 188 |
+
|
| 189 |
+
| Layer Range | Weight | Strategy |
|
| 190 |
+
|---|---|---|
|
| 191 |
+
| Layer 0-3 | 0.81 | Absorb Mother's embedding-adjacent layers |
|
| 192 |
+
| Layer 15-16 | 0.91 | Maximum Mother creativity/character layer reinforcement |
|
| 193 |
+
| Layer 22-25 | **0.95** | **Maximum absorption of Mother's KOREAN hotspot** |
|
| 194 |
+
| Layer 26-27 | 0.40 | Father priority preservation zone |
|
| 195 |
+
| Layer 30-40 | 0.48 | Father REASONING/MATH preservation |
|
| 196 |
+
| Layer 40-42 | 0.62 | Output layer balance |
|
| 197 |
+
|
| 198 |
+
### Parent Comparison
|
| 199 |
+
|
| 200 |
+
<p align="center">
|
| 201 |
+
<img src="parent_comparison.png" alt="Father vs Mother layer-wise importance comparison" width="100%">
|
| 202 |
+
</p>
|
| 203 |
|
| 204 |
+
### Evolution Parameters
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
|
| 206 |
+
| Setting | Value |
|
| 207 |
+
|---|---|
|
| 208 |
+
| Merge method | DARE-TIES (direct PyTorch, no mergekit dependency) |
|
| 209 |
+
| Density | 0.800 ~ 0.850 |
|
| 210 |
+
| Normalization | normalize: true |
|
| 211 |
+
| Evolution method | Darwin mergekit (MRI-guided) |
|
| 212 |
+
| Population size | 20 |
|
| 213 |
+
| Phase 1 (proxy search) | 200 steps |
|
| 214 |
+
| Phase 2 (real merge) | 10 steps, top 5 elite |
|
| 215 |
+
| Fitness function | kmmlu_lite (Korean knowledge) |
|
| 216 |
+
| Best fitness | **0.8412 (84.12%)** |
|
| 217 |
+
| Total time | 45.3 minutes (H100 Γ1) |
|
| 218 |
+
|
| 219 |
+
---
|
| 220 |
+
|
| 221 |
+
## Darwin V6 vs Conventional Merging
|
| 222 |
+
|
| 223 |
+
| Capability | mergekit (DARE-TIES) | Darwin V6 |
|
| 224 |
+
|---|---|---|
|
| 225 |
+
| Implementation | Library call (mergekit CLI) | Direct PyTorch tensor operations, no external dependency |
|
| 226 |
+
| Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MDS diagnostic (independent ratios per tensor) |
|
| 227 |
+
| Pre-merge analysis | None | Static tensor profiling (entropy, std, norm) + probe-based functional importance (5 probes) |
|
| 228 |
+
| Transplant | Not supported | ratio < 0.15 β Father 100%, ratio > 0.85 β Mother 100% (zero interpolation noise) |
|
| 229 |
+
| Post-merge validation | Benchmark score only | Layer-by-layer Health Check: child vs both parents, interference and function loss detection |
|
| 230 |
+
| Search method | Manual tuning | CMA-ES evolution with adaptive genome |
|
| 231 |
+
| Reproducibility | Config file | genome_hash seed guarantees identical output for identical genome |
|
| 232 |
+
| GPU efficiency | Single merge per run | Phase 1 proxy (200 steps, seconds) β Phase 2 real merge (top-k only evaluated) |
|
| 233 |
+
|
| 234 |
+
---
|
| 235 |
+
|
| 236 |
+
## Significance of Second-Generation Evolution
|
| 237 |
+
|
| 238 |
+
1. **Proof of "Evolution of Evolution"**: The first systematic case of recursive evolution (2+ generations) in the open-source model merging community. Darwin V6 + MRI automates the entire process.
|
| 239 |
+
|
| 240 |
+
2. **85% GPQA Diamond at 4.5B parameters**: +26.4%p over the original 58.6%. This **surpasses the 31B-class gemma-4-31B (84.3%) with only 4.5B parameters** β an exceptional result in parameter efficiency.
|
| 241 |
+
|
| 242 |
+
3. **Apache 2.0 + Edge deployment**: Preserves the Gemma 4 E4B architecture, enabling deployment on Jetson Orin NX 16GB and consumer GPUs with no commercial restrictions.
|
| 243 |
+
|
| 244 |
+
4. **Multimodal preservation**: Father's vision encoder (~150M) and audio encoder (~300M) are frozen during evolution, maintaining image/video/audio input capabilities.
|
| 245 |
+
|
| 246 |
+
5. **Community synergy**: Mother model creator DavidAU is an active contributor on HuggingFace. Darwin-4B-David symbolizes collaborative evolution within the open-source ecosystem.
|
| 247 |
+
|
| 248 |
+
---
|
| 249 |
+
|
| 250 |
+
## Model Specifications
|
| 251 |
+
|
| 252 |
+
| | |
|
| 253 |
+
|---|---|
|
| 254 |
+
| Architecture | Gemma 4 E4B Dense |
|
| 255 |
+
| Effective Parameters | 4.5B (8B total with embeddings) |
|
| 256 |
+
| Layers | 42 |
|
| 257 |
+
| Sliding Window | 512 tokens |
|
| 258 |
+
| Precision | BF16 |
|
| 259 |
+
| Context | 128K |
|
| 260 |
+
| Vocabulary | 262K |
|
| 261 |
+
| Languages | 140+ |
|
| 262 |
+
| Thinking | enable_thinking=True chain-of-thought |
|
| 263 |
+
| Vision Encoder | ~150M (image, video) |
|
| 264 |
+
| Audio Encoder | ~300M (speech recognition) |
|
| 265 |
+
| License | Apache 2.0 |
|
| 266 |
+
|
| 267 |
+
---
|
| 268 |
+
|
| 269 |
+
## Usage
|
| 270 |
+
|
| 271 |
+
### Transformers
|
| 272 |
+
|
| 273 |
+
```python
|
| 274 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 275 |
+
import torch
|
| 276 |
+
|
| 277 |
+
tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-4B-David", trust_remote_code=True)
|
| 278 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 279 |
+
"FINAL-Bench/Darwin-4B-David",
|
| 280 |
+
torch_dtype=torch.bfloat16,
|
| 281 |
+
device_map="auto",
|
| 282 |
+
trust_remote_code=True,
|
| 283 |
+
)
|
| 284 |
+
|
| 285 |
+
messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
|
| 286 |
+
text = tokenizer.apply_chat_template(
|
| 287 |
+
messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
|
| 288 |
+
)
|
| 289 |
+
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 290 |
+
outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
|
| 291 |
+
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
|
| 292 |
```
|
| 293 |
+
|
| 294 |
+
### Disable Thinking Mode
|
| 295 |
+
|
| 296 |
+
```python
|
| 297 |
+
text = tokenizer.apply_chat_template(
|
| 298 |
+
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
|
| 299 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 300 |
```
|
| 301 |
|
| 302 |
+
---
|
| 303 |
+
|
| 304 |
+
## VRAM Requirements
|
| 305 |
+
|
| 306 |
+
| Setup | VRAM | Status |
|
| 307 |
+
|---|---|---|
|
| 308 |
+
| BF16 Full Precision | ~16 GB | |
|
| 309 |
+
| NVIDIA RTX 4090 24GB | 24 GB | Single GPU, very comfortable |
|
| 310 |
+
| NVIDIA RTX 3090 24GB | 24 GB | Single GPU, comfortable |
|
| 311 |
+
| NVIDIA RTX 4080 16GB | 16 GB | Single GPU |
|
| 312 |
+
| NVIDIA T4 16GB | 16 GB | Cloud/Colab friendly |
|
| 313 |
+
| Jetson Orin NX 16GB | 16 GB | Edge deployment ready |
|
| 314 |
+
|
| 315 |
+
---
|
| 316 |
+
|
| 317 |
+
## Darwin Opus Family
|
| 318 |
+
|
| 319 |
+
| Model | Gen | Architecture | Parameters | Context | Base | GPQA Diamond |
|
| 320 |
+
|---|---|---|---|---|---|---|
|
| 321 |
+
| **Darwin-4B-David** | **π₯ Gen 2** | **Dense (E4B)** | **4.5B** | **128K** | **Darwin-4B-Opus Γ DECKARD** | **85.0%** |
|
| 322 |
+
| Darwin-4B-Opus | Gen 1 | Dense (E4B) | 4.5B | 128K | gemma-4-E4B-it | β |
|
| 323 |
+
| Darwin-9B-Opus | Gen 1 | Dense | 9B | 131K | Qwen3.5-9B | β |
|
| 324 |
+
| Darwin-31B-Opus | Gen 1 | Dense | 31B | 256K | gemma-4-31B-it | β |
|
| 325 |
+
| Darwin-35B-A3B-Opus | Gen 1 | MoE | 35B (3B active) | 256K | Qwen3.5-35B-A3B | 90.0% |
|
| 326 |
+
|
| 327 |
+
---
|
| 328 |
+
|
| 329 |
+
## Roadmap
|
| 330 |
+
|
| 331 |
+
- Full 198-question GPQA Diamond evaluation (maj@8)
|
| 332 |
+
- MTI (Minimal Test-Time Intervention) serving β expected additional +9-11% reasoning accuracy
|
| 333 |
+
- GRPO + TinyLoRA reinforcement learning
|
| 334 |
+
- SSD self-distillation
|
| 335 |
+
- Cross-architecture breeding research (Transformer Γ Mamba FFN transplantation)
|
| 336 |
+
|
| 337 |
+
---
|
| 338 |
+
|
| 339 |
+
## References
|
| 340 |
|
| 341 |
+
- DARE-TIES: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) β re-implemented, not library-dependent
|
| 342 |
+
- Darwin V6 Engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
|
| 343 |
+
- FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard
|
| 344 |
+
- DavidAU DECKARD Series: https://huggingface.co/DavidAU
|
| 345 |
+
- MTI: Minimal Test-Time Intervention (arXiv:2510.13940)
|
| 346 |
|
| 347 |
+
---
|
| 348 |
+
|
| 349 |
+
## Built By
|
| 350 |
+
|
| 351 |
+
| | |
|
| 352 |
+
|---|---|
|
| 353 |
+
| Developer | VIDRAFT |
|
| 354 |
+
| Engine | Darwin V6 (Diagnostic-Guided Evolutionary Merge) |
|
| 355 |
+
| Generation | **Generation 2** β First in Darwin history |
|
| 356 |
+
| Architecture | Gemma-4-E4B Dense |
|
| 357 |
+
| License | Apache 2.0 |
|
| 358 |
+
|
| 359 |
+
---
|
| 360 |
|
| 361 |
+
## Citation
|
| 362 |
|
| 363 |
+
```bibtex
|
| 364 |
+
@misc{vidraft_darwin_4b_david_2026,
|
| 365 |
+
title = {Darwin-4B-David: First Second-Generation Evolutionary Merge Model},
|
| 366 |
+
subtitle = {Recursive Evolution Achieves 85\% GPQA Diamond with 4.5B Parameters},
|
| 367 |
+
author = {VIDRAFT},
|
| 368 |
+
year = {2026},
|
| 369 |
+
publisher = {Hugging Face},
|
| 370 |
+
howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-David}}
|
| 371 |
+
}
|
| 372 |
+
```
|