SeaWolf-AI commited on
Commit
e0589b5
Β·
verified Β·
1 Parent(s): a159c2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +198 -37
README.md CHANGED
@@ -7,9 +7,24 @@ tags:
7
  - darwin-v6
8
  - evolutionary-merge
9
  - mri-guided
10
- - dare_ties
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
 
 
13
  <p align="center">
14
  <!-- Small Models -->
15
  <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Opus-blue?style=for-the-badge" alt="4B Model"></a>
@@ -34,53 +49,199 @@ tags:
34
  <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/πŸ“Š_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
35
  </p>
36
 
37
- # Darwin V6 Evolved Model
 
 
 
 
38
 
39
- Created by Darwin V6 diagnostic-guided evolutionary merge engine.
 
 
 
 
 
 
40
 
41
  ## Parent Models
42
- - Father: `google/gemma-4-E4B-it`
43
- - Mother: `arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  ## Evolution Result
46
- - Benchmark score: 0.8292
47
- - Merge method: dare_ties
48
- - Merge hash:
49
 
50
- ## Merge Statistics
51
- - Total tensors merged: 0
52
- - Transplant A (Father preserved): 0
53
- - Transplant B (Mother preserved): 0
54
- - Blended: 0
 
 
55
 
56
- ## Optimal Genome
57
  ```
58
- global_ratio: 0.4989
59
- attn_ratio: 0.1766
60
- ffn_ratio: 0.9021
61
- embed_ratio: 0.6122
62
- density_a: 0.9951
63
- density_b: 0.9617
64
- block_0_ratio: 0.5740
65
- block_1_ratio: 0.5811
66
- block_2_ratio: 0.5736
67
- block_3_ratio: 0.4697
68
- block_4_ratio: 0.4930
69
- block_5_ratio: 0.8418
70
- mri_trust: 0.4907
71
- merge_method_weight: 0.3623
72
  ```
73
 
74
- ## Health Check
75
- Not performed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
- ## Method
78
- Darwin V6 implements DARE-TIES merge directly via PyTorch tensor operations.
79
- Per-tensor ratios are determined by MRI diagnostic (static tensor analysis +
80
- probe-based functional importance) combined with evolutionary genome search.
 
 
 
 
81
 
82
- Formula: final_ratio = mri_ratio * mri_trust + genome_ratio * (1 - mri_trust)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
- DARE-TIES algorithm: Yadav et al., 2023 (re-implemented, not library-dependent)
85
 
86
- Built by VIDRAFT. Apache 2.0.
 
 
 
 
 
 
 
 
 
7
  - darwin-v6
8
  - evolutionary-merge
9
  - mri-guided
10
+ - dare-ties
11
+ - gemma4
12
+ - reasoning
13
+ - thinking
14
+ - proto-agi
15
+ - vidraft
16
+ language:
17
+ - en
18
+ - ko
19
+ - ja
20
+ - zh
21
+ - multilingual
22
+ pipeline_tag: text-generation
23
+ library_name: transformers
24
  ---
25
 
26
+ # Darwin-4B-Opus
27
+
28
  <p align="center">
29
  <!-- Small Models -->
30
  <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Opus-blue?style=for-the-badge" alt="4B Model"></a>
 
49
  <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/πŸ“Š_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
50
  </p>
51
 
52
+ > Gemma 4 Expert 4B (MoE) | Thinking Mode | 128K Context | 140+ Languages | BF16 | Apache 2.0
53
+
54
+ ---
55
+
56
+ ## Overview
57
 
58
+ Darwin-4B-Opus is a reasoning-enhanced model created by merging google/gemma-4-E4B-it (Father) and arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled (Mother) using the Darwin V6 engine.
59
+
60
+ Darwin V6 diagnoses both parent models at the tensor level before merging, assigning an independent optimal ratio to each tensor. This is fundamentally different from conventional merging tools that apply a single uniform ratio across all tensors.
61
+
62
+ As the smallest member of the Darwin Opus family, Darwin-4B-Opus delivers Claude Opus-level reasoning distillation in a highly efficient 4B parameter MoE architecture, making it ideal for edge deployment, rapid prototyping, and resource-constrained environments while maintaining strong benchmark performance (0.8292 ARC-Challenge).
63
+
64
+ ---
65
 
66
  ## Parent Models
67
+
68
+ | Role | Model | Characteristics |
69
+ |---|---|---|
70
+ | Father | google/gemma-4-E4B-it | Gemma 4 Expert 4B (MoE), multimodal, 128K context, efficient inference |
71
+ | Mother | arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled | Claude 4.6 Opus high-effort reasoning distillation, enhanced code/science/analysis |
72
+
73
+ ---
74
+
75
+ ## Benchmarks
76
+
77
+ | Benchmark | Darwin-4B-Opus | Condition |
78
+ |---|---|---|
79
+ | ARC-Challenge | 82.92% | loglikelihood, zero-shot |
80
+
81
+ Note: Gemma 4 architecture (Gemma4ForConditionalGeneration) has limited compatibility with lm-eval's loglikelihood method due to its multimodal wrapper structure. Only generative evaluation produces valid results for Gemma 4 based models. Full extended evaluation with Majority Voting is planned.
82
+
83
+ ---
84
+
85
+ ## Darwin V6 vs Conventional Merging
86
+
87
+ | Capability | mergekit (DARE-TIES) | Darwin V6 |
88
+ |---|---|---|
89
+ | Implementation | Library call (mergekit CLI) | Direct PyTorch tensor operations, no external dependency |
90
+ | Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MDS diagnostic (independent ratios per tensor) |
91
+ | Pre-merge analysis | None | Static tensor profiling (entropy, std, norm) + probe-based functional importance (5 probes) |
92
+ | Ratio formula | Human-set or grid search | combined = static Γ— 0.4 + probe Γ— 0.6, then evolutionary optimization |
93
+ | Transplant | Not supported | ratio < 0.15 β†’ Father 100%, ratio > 0.85 β†’ Mother 100% (zero interpolation noise) |
94
+ | Post-merge validation | Benchmark score only | Layer-by-layer Health Check: child vs both parents, interference and function loss detection |
95
+ | Search method | Manual tuning | CMA-ES evolution with adaptive 14-dimensional genome |
96
+ | Reproducibility | Config file | genome_hash seed guarantees identical output for identical genome |
97
+ | GPU efficiency | Single merge per run | Phase 1 proxy (200 steps, seconds) β†’ Phase 2 real merge (top-k only evaluated) |
98
+
99
+ ---
100
+
101
+ ## How Darwin V6 Works
102
+
103
+ Darwin V6 does not use mergekit or any external merge library. It re-implements DARE-TIES (Yadav et al., 2023) directly via PyTorch tensor operations with per-tensor diagnostic ratios.
104
+
105
+ Before merging, Darwin performs a Model Diagnostic Scan (MDS) on both parents. For every tensor, it measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy). Additionally, 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model, measuring cosine distance when each layer is skipped to determine functional importance.
106
+
107
+ The final merge ratio for each tensor:
108
+
109
+ ```
110
+ static_score = entropy Γ— 0.3 + std Γ— 0.2 + clamp(norm, 100) Γ— 0.002
111
+ probe_score = Ξ£(cosine_distance[probe_i] Γ— weight_i)
112
+ combined = static Γ— 0.4 + probe Γ— 0.6
113
+ mri_ratio = combined_b / (combined_a + combined_b)
114
+ final_ratio = mri_ratio Γ— mri_trust + genome_ratio Γ— (1 - mri_trust)
115
+ ```
116
+
117
+ The mri_trust parameter itself is optimized by the CMA-ES evolutionary algorithm, allowing the system to automatically determine the optimal balance between diagnostic prescription and evolutionary search for each model pair.
118
+
119
+ ---
120
 
121
  ## Evolution Result
 
 
 
122
 
123
+ | | |
124
+ |---|---|
125
+ | Best Score (ARC-Challenge) | 0.8292 |
126
+ | Merge Method | DARE-TIES (direct PyTorch) |
127
+ | Health Check | Not performed |
128
+
129
+ Optimal Genome (14-dimensional adaptive):
130
 
 
131
  ```
132
+ global_ratio: 0.4989 (overall merge ratio β€” near balanced)
133
+ attn_ratio: 0.1766 (Attention layers β€” Father strongly dominant)
134
+ ffn_ratio: 0.9021 (FFN layers β€” Mother strongly dominant)
135
+ embed_ratio: 0.6122 (Embedding β€” slight Mother bias)
136
+ density_a: 0.9951 (Father DARE density β€” nearly full)
137
+ density_b: 0.9617 (Mother DARE density β€” high)
138
+ block_0_ratio: 0.5740 (early layers β€” slight Mother bias)
139
+ block_1_ratio: 0.5811 (early-mid layers β€” slight Mother bias)
140
+ block_2_ratio: 0.5736 (mid layers β€” slight Mother bias)
141
+ block_3_ratio: 0.4697 (mid-late layers β€” near balanced, slight Father)
142
+ block_4_ratio: 0.4930 (late layers β€” near balanced)
143
+ block_5_ratio: 0.8418 (final layers, reasoning core β€” Mother dominant)
144
+ mri_trust: 0.4907 (MDS 49% + Genome 51% β€” near equal trust)
145
+ merge_method_weight: 0.3623
146
  ```
147
 
148
+ Key observations from the genome: ffn_ratio=0.90 indicates the FFN layers strongly favor the Mother (Claude Opus Distill), carrying the bulk of the reasoning enhancement. block_5 (final layers)=0.84 shows the reasoning core layers also strongly favor Mother, consistent with the pattern seen across all Darwin Opus models where Claude's reasoning capability concentrates in the final layers. Meanwhile, attn_ratio=0.18 firmly preserves Father's attention structure, maintaining the original Gemma 4 multimodal and context capabilities. Notably, mri_trust=0.49 shows the system found near-equal value in both diagnostic analysis and evolutionary search, suggesting a well-balanced optimization.
149
+
150
+ ---
151
+
152
+ ## Model Specifications
153
+
154
+ | | |
155
+ |---|---|
156
+ | Architecture | Gemma 4 Expert 4B (Mixture of Experts) |
157
+ | Parameters | 4B |
158
+ | Precision | BF16 |
159
+ | Context | 128K |
160
+ | Languages | 140+ |
161
+ | Thinking | enable_thinking=True chain-of-thought |
162
+ | License | Apache 2.0 |
163
+
164
+ ---
165
+
166
+ ## Usage
167
+
168
+ ### Transformers
169
+
170
+ ```python
171
+ from transformers import AutoTokenizer, AutoModelForCausalLM
172
+ import torch
173
+
174
+ tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-4B-Opus", trust_remote_code=True)
175
+ model = AutoModelForCausalLM.from_pretrained(
176
+ "FINAL-Bench/Darwin-4B-Opus",
177
+ torch_dtype=torch.bfloat16,
178
+ device_map="auto",
179
+ trust_remote_code=True,
180
+ )
181
+
182
+ messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
183
+ text = tokenizer.apply_chat_template(
184
+ messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
185
+ )
186
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
187
+ outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
188
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
189
+ ```
190
+
191
+ ---
192
+
193
+ ## VRAM Requirements
194
+
195
+ | Setup | VRAM | Status |
196
+ |---|---|---|
197
+ | BF16 Full Precision | ~8 GB | |
198
+ | NVIDIA RTX 4090 24GB | 24 GB | Single GPU, very comfortable |
199
+ | NVIDIA RTX 3090 24GB | 24 GB | Single GPU, comfortable |
200
+ | NVIDIA RTX 4080 16GB | 16 GB | Single GPU |
201
+ | NVIDIA T4 16GB | 16 GB | Cloud/Colab friendly |
202
+
203
+ Darwin-4B-Opus is the most accessible model in the Darwin Opus family, running comfortably on a single consumer GPU.
204
+
205
+ ---
206
+
207
+ ## Darwin Opus Family
208
 
209
+ | Model | Architecture | Parameters | Context | Base |
210
+ |---|---|---|---|---|
211
+ | **Darwin-4B-Opus** | MoE (E4B) | 4B | 128K | gemma-4-E4B-it |
212
+ | Darwin-9B-Opus | β€” | 9B | β€” | gemma-4-9B-it |
213
+ | Darwin-31B-Opus | Dense | 31B | 256K | gemma-4-31B-it |
214
+ | Darwin-35B-A3B-Opus | MoE | 35B (3B active) | 256K | gemma-4-35B-A3B-it |
215
+
216
+ ---
217
 
218
+ ## References
219
+
220
+ - DARE-TIES: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) β€” re-implemented, not library-dependent
221
+ - Darwin V6 Engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
222
+ - FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard
223
+
224
+ ---
225
+
226
+ ## Built By
227
+
228
+ | | |
229
+ |---|---|
230
+ | Developer | VIDRAFT |
231
+ | Engine | Darwin V6 (Diagnostic-Guided Evolutionary Merge) |
232
+ | Architecture | Gemma-4-E4B (MoE) |
233
+ | License | Apache 2.0 |
234
+
235
+ ---
236
 
237
+ ## Citation
238
 
239
+ ```bibtex
240
+ @misc{vidraft_darwin_4b_opus,
241
+ title = {Darwin-4B-Opus: Diagnostic-Guided Evolutionary Merge on Gemma 4 E4B},
242
+ author = {VIDRAFT},
243
+ year = {2026},
244
+ publisher = {Hugging Face},
245
+ howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-Opus}}
246
+ }
247
+ ```