File size: 12,000 Bytes
9a416f2
 
 
 
 
 
 
 
 
e0589b5
 
 
 
 
 
 
 
 
 
 
 
 
 
9a416f2
 
e0589b5
 
ed3d63f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256866f
 
 
 
e0589b5
 
 
 
 
9a416f2
e0589b5
 
 
 
 
 
 
9a416f2
 
e0589b5
 
 
 
 
 
256866f
 
 
 
 
 
 
 
 
e0589b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256866f
 
 
 
 
 
e0589b5
9a416f2
 
 
e0589b5
 
 
 
 
 
 
9a416f2
 
e0589b5
 
 
 
 
 
 
 
 
 
 
 
 
 
9a416f2
 
e0589b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a416f2
e0589b5
 
 
 
 
 
 
 
9a416f2
e0589b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a416f2
e0589b5
9a416f2
e0589b5
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
---
license: apache-2.0
base_model:
  - google/gemma-4-E4B-it
  - arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled
tags:
  - darwin-v6
  - evolutionary-merge
  - mri-guided
  - dare-ties
  - gemma4
  - reasoning
  - thinking
  - proto-agi
  - vidraft
language:
  - en
  - ko
  - ja
  - zh
  - multilingual
pipeline_tag: text-generation
library_name: transformers
---

# Darwin-4B-Opus

<p align="center">
  <!-- Small Models -->
  <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Opus-blue?style=for-the-badge" alt="4B Model"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/πŸš€_Space-4B_Demo-purple?style=for-the-badge" alt="4B Space"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B Model"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/πŸš€_Space-9B_Demo-purple?style=for-the-badge" alt="9B Space"></a>
</p>

<p align="center">
  <!-- Large Models -->
  <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B Model"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/πŸš€_Space-31B_Demo-purple?style=for-the-badge" alt="31B Space"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B Model"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/πŸš€_Space-35B_Demo-purple?style=for-the-badge" alt="35B Space"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF"><img src="https://img.shields.io/badge/πŸ“¦_GGUF-Q8--Official-yellow?style=for-the-badge" alt="Q8 GGUF"></a>
  <a href="https://huggingface.co/bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF"><img src="https://img.shields.io/badge/πŸ“¦_GGUF-bartowski-yellow?style=for-the-badge" alt="bartowski GGUF"></a>
</p>

<p align="center">
  <!-- Benchmarks -->
  <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/πŸ†_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/πŸ“Š_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
</p>

<p align="center">
  <img src="info.png" alt="Darwin-4B-Opus" width="100%">
</p>

> Gemma 4 Expert 4B (MoE) | Thinking Mode | 128K Context | 140+ Languages | BF16 | Apache 2.0

---

## Overview

Darwin-4B-Opus is a reasoning-enhanced model created by merging google/gemma-4-E4B-it (Father) and arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled (Mother) using the Darwin V6 engine.

Darwin V6 diagnoses both parent models at the tensor level before merging, assigning an independent optimal ratio to each tensor. This is fundamentally different from conventional merging tools that apply a single uniform ratio across all tensors.

As the smallest member of the Darwin Opus family, Darwin-4B-Opus delivers Claude Opus-level reasoning distillation in a highly efficient 4B parameter MoE architecture, making it ideal for edge deployment, rapid prototyping, and resource-constrained environments while maintaining strong benchmark performance (0.8292 ARC-Challenge).

---

## Parent Models

| Role | Model | Characteristics |
|---|---|---|
| Father | google/gemma-4-E4B-it | Gemma 4 Expert 4B (MoE), multimodal, 128K context, efficient inference |
| Mother | arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled | Claude 4.6 Opus high-effort reasoning distillation, enhanced code/science/analysis |

### Model Diagnostic Scan (MDS)

<p align="center">
  <img src="s1.png" alt="Father (gemma-4-E4B-it) MDS Scan" width="48%">
  <img src="s2.png" alt="Mother (Claude-Opus-Distill) MDS Scan" width="48%">
</p>

Left: Father (gemma-4-E4B-it) β€” balanced generalist with low activation across most probes. Right: Mother (Claude-Opus-Distill) β€” strong REASONING concentration in later layers, CODE activation in late layers. The Mother shows significantly more specialized layer patterns from Claude Opus distillation.

---

## Benchmarks

| Benchmark | Darwin-4B-Opus | Condition |
|---|---|---|
| ARC-Challenge | 82.92% | loglikelihood, zero-shot |

Note: Gemma 4 architecture (Gemma4ForConditionalGeneration) has limited compatibility with lm-eval's loglikelihood method due to its multimodal wrapper structure. Only generative evaluation produces valid results for Gemma 4 based models. Full extended evaluation with Majority Voting is planned.

---

## Darwin V6 vs Conventional Merging

| Capability | mergekit (DARE-TIES) | Darwin V6 |
|---|---|---|
| Implementation | Library call (mergekit CLI) | Direct PyTorch tensor operations, no external dependency |
| Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MDS diagnostic (independent ratios per tensor) |
| Pre-merge analysis | None | Static tensor profiling (entropy, std, norm) + probe-based functional importance (5 probes) |
| Ratio formula | Human-set or grid search | combined = static Γ— 0.4 + probe Γ— 0.6, then evolutionary optimization |
| Transplant | Not supported | ratio < 0.15 β†’ Father 100%, ratio > 0.85 β†’ Mother 100% (zero interpolation noise) |
| Post-merge validation | Benchmark score only | Layer-by-layer Health Check: child vs both parents, interference and function loss detection |
| Search method | Manual tuning | CMA-ES evolution with adaptive 14-dimensional genome |
| Reproducibility | Config file | genome_hash seed guarantees identical output for identical genome |
| GPU efficiency | Single merge per run | Phase 1 proxy (200 steps, seconds) β†’ Phase 2 real merge (top-k only evaluated) |

---

## How Darwin V6 Works

Darwin V6 does not use mergekit or any external merge library. It re-implements DARE-TIES (Yadav et al., 2023) directly via PyTorch tensor operations with per-tensor diagnostic ratios.

Before merging, Darwin performs a Model Diagnostic Scan (MDS) on both parents. For every tensor, it measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy). Additionally, 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model, measuring cosine distance when each layer is skipped to determine functional importance.

The final merge ratio for each tensor:

```
static_score = entropy Γ— 0.3 + std Γ— 0.2 + clamp(norm, 100) Γ— 0.002
probe_score  = Ξ£(cosine_distance[probe_i] Γ— weight_i)
combined     = static Γ— 0.4 + probe Γ— 0.6
mri_ratio    = combined_b / (combined_a + combined_b)
final_ratio  = mri_ratio Γ— mri_trust + genome_ratio Γ— (1 - mri_trust)
```

The mri_trust parameter itself is optimized by the CMA-ES evolutionary algorithm, allowing the system to automatically determine the optimal balance between diagnostic prescription and evolutionary search for each model pair.

### Parent Comparison (MDS Result)

<p align="center">
  <img src="parent_comparison.png" alt="Parent Comparison β€” Layer-wise Importance" width="100%">
</p>

---

## Evolution Result

| | |
|---|---|
| Best Score (ARC-Challenge) | 0.8292 |
| Merge Method | DARE-TIES (direct PyTorch) |
| Health Check | Not performed |

Optimal Genome (14-dimensional adaptive):

```
global_ratio:        0.4989   (overall merge ratio β€” near balanced)
attn_ratio:          0.1766   (Attention layers β€” Father strongly dominant)
ffn_ratio:           0.9021   (FFN layers β€” Mother strongly dominant)
embed_ratio:         0.6122   (Embedding β€” slight Mother bias)
density_a:           0.9951   (Father DARE density β€” nearly full)
density_b:           0.9617   (Mother DARE density β€” high)
block_0_ratio:       0.5740   (early layers β€” slight Mother bias)
block_1_ratio:       0.5811   (early-mid layers β€” slight Mother bias)
block_2_ratio:       0.5736   (mid layers β€” slight Mother bias)
block_3_ratio:       0.4697   (mid-late layers β€” near balanced, slight Father)
block_4_ratio:       0.4930   (late layers β€” near balanced)
block_5_ratio:       0.8418   (final layers, reasoning core β€” Mother dominant)
mri_trust:           0.4907   (MDS 49% + Genome 51% β€” near equal trust)
merge_method_weight: 0.3623
```

Key observations from the genome: ffn_ratio=0.90 indicates the FFN layers strongly favor the Mother (Claude Opus Distill), carrying the bulk of the reasoning enhancement. block_5 (final layers)=0.84 shows the reasoning core layers also strongly favor Mother, consistent with the pattern seen across all Darwin Opus models where Claude's reasoning capability concentrates in the final layers. Meanwhile, attn_ratio=0.18 firmly preserves Father's attention structure, maintaining the original Gemma 4 multimodal and context capabilities. Notably, mri_trust=0.49 shows the system found near-equal value in both diagnostic analysis and evolutionary search, suggesting a well-balanced optimization.

---

## Model Specifications

| | |
|---|---|
| Architecture | Gemma 4 Expert 4B (Mixture of Experts) |
| Parameters | 4B |
| Precision | BF16 |
| Context | 128K |
| Languages | 140+ |
| Thinking | enable_thinking=True chain-of-thought |
| License | Apache 2.0 |

---

## Usage

### Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-4B-Opus", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-4B-Opus",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
```

---

## VRAM Requirements

| Setup | VRAM | Status |
|---|---|---|
| BF16 Full Precision | ~8 GB | |
| NVIDIA RTX 4090 24GB | 24 GB | Single GPU, very comfortable |
| NVIDIA RTX 3090 24GB | 24 GB | Single GPU, comfortable |
| NVIDIA RTX 4080 16GB | 16 GB | Single GPU |
| NVIDIA T4 16GB | 16 GB | Cloud/Colab friendly |

Darwin-4B-Opus is the most accessible model in the Darwin Opus family, running comfortably on a single consumer GPU.

---

## Darwin Opus Family

| Model | Architecture | Parameters | Context | Base |
|---|---|---|---|---|
| **Darwin-4B-Opus** | MoE (E4B) | 4B | 128K | gemma-4-E4B-it |
| Darwin-9B-Opus | β€” | 9B | β€” | gemma-4-9B-it |
| Darwin-31B-Opus | Dense | 31B | 256K | gemma-4-31B-it |
| Darwin-35B-A3B-Opus | MoE | 35B (3B active) | 256K | gemma-4-35B-A3B-it |

---

## References

- DARE-TIES: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) β€” re-implemented, not library-dependent
- Darwin V6 Engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
- FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard

---

## Built By

| | |
|---|---|
| Developer | VIDRAFT |
| Engine | Darwin V6 (Diagnostic-Guided Evolutionary Merge) |
| Architecture | Gemma-4-E4B (MoE) |
| License | Apache 2.0 |

---

## Citation

```bibtex
@misc{vidraft_darwin_4b_opus,
  title        = {Darwin-4B-Opus: Diagnostic-Guided Evolutionary Merge on Gemma 4 E4B},
  author       = {VIDRAFT},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-Opus}}
}
```