SeaWolf-AI commited on
Commit
dbdd6bc
·
verified ·
1 Parent(s): 6f4fe79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +179 -37
README.md CHANGED
@@ -7,56 +7,198 @@ tags:
7
  - darwin-v6
8
  - evolutionary-merge
9
  - mri-guided
10
- - dare_ties
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Darwin V6 Evolved Model
14
 
15
- Created by Darwin V6 diagnostic-guided evolutionary merge engine.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Parent Models
18
- - Father: `google/gemma-4-31B-it`
19
- - Mother: `TeichAI/gemma-4-31B-it-Claude-Opus-Distill`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Evolution Result
22
- - Benchmark score: 0.8289
23
- - Merge method: dare_ties
24
- - Merge hash:
25
 
26
- ## Merge Statistics
27
- - Total tensors merged: 0
28
- - Transplant A (Father preserved): 0
29
- - Transplant B (Mother preserved): 0
30
- - Blended: 0
 
 
 
 
 
 
31
 
32
- ## Optimal Genome
33
  ```
34
- global_ratio: 0.5147
35
- attn_ratio: 0.3169
36
- ffn_ratio: 0.9316
37
- embed_ratio: 0.7748
38
- density_a: 0.8997
39
- density_b: 0.9539
40
- block_0_ratio: 0.6628
41
- block_1_ratio: 0.6431
42
- block_2_ratio: 0.5146
43
- block_3_ratio: 0.5971
44
- block_4_ratio: 0.6339
45
- block_5_ratio: 0.8583
46
- mri_trust: 0.3631
47
- merge_method_weight: 0.6897
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ```
49
 
50
- ## Health Check
51
- Not performed
 
 
 
 
 
 
 
 
 
 
52
 
53
- ## Method
54
- Darwin V6 implements DARE-TIES merge directly via PyTorch tensor operations.
55
- Per-tensor ratios are determined by MRI diagnostic (static tensor analysis +
56
- probe-based functional importance) combined with evolutionary genome search.
57
 
58
- Formula: final_ratio = mri_ratio * mri_trust + genome_ratio * (1 - mri_trust)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- DARE-TIES algorithm: Yadav et al., 2023 (re-implemented, not library-dependent)
61
 
62
- Built by VIDRAFT. Apache 2.0.
 
 
 
 
 
 
 
 
 
7
  - darwin-v6
8
  - evolutionary-merge
9
  - mri-guided
10
+ - dare-ties
11
+ - gemma4
12
+ - reasoning
13
+ - thinking
14
+ - proto-agi
15
+ - vidraft
16
+ language:
17
+ - en
18
+ - ko
19
+ - ja
20
+ - zh
21
+ - multilingual
22
+ pipeline_tag: text-generation
23
+ library_name: transformers
24
  ---
25
 
26
+ # Darwin-31B-Opus
27
 
28
+ <p align="center">
29
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="Model"></a>
30
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B Model"></a>
31
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
32
+ <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
33
+ </p>
34
+
35
+ > Gemma 4 Dense 31B | Thinking Mode | 256K Context | 140+ Languages | BF16 | Apache 2.0
36
+
37
+ ---
38
+
39
+ ## Overview
40
+
41
+ Darwin-31B-Opus is a reasoning-enhanced model created by the Darwin V6 engine, using Google's Gemma-4-31B-it as Father and TeichAI's Claude Opus Distill as Mother.
42
+
43
+ Darwin V6 diagnoses both parent models at the tensor level and computes an independent optimal merge ratio for each tensor. Unlike conventional merging methods that apply a uniform ratio across all tensors, Darwin V6 assigns a unique ratio to each of the 1,188 tensors, determined by the combination of MRI diagnostic results and evolutionary algorithm optimization.
44
+
45
+ ---
46
 
47
  ## Parent Models
48
+
49
+ | Role | Model | Characteristics |
50
+ |---|---|---|
51
+ | Father | google/gemma-4-31B-it | Gemma 4 Dense 31B, multimodal, 256K context, LMArena 1452 (open model #3) |
52
+ | Mother | TeichAI/gemma-4-31B-it-Claude-Opus-Distill | Claude 4.6 Opus high-effort reasoning distillation, coding/science/analysis |
53
+
54
+ ---
55
+
56
+ ## Benchmark
57
+
58
+ | Benchmark | Darwin-31B-Opus | Father (gemma-4-31B-it) | Condition |
59
+ |---|---|---|---|
60
+ | ARC-Challenge | 82.89% | - | loglikelihood, zero-shot, 200 questions |
61
+
62
+ Note: Gemma 4 architecture (Gemma4ForConditionalGeneration) is a multimodal wrapper structure with limited compatibility with lm-eval's loglikelihood method. In generative evaluation (greedy, thinking mode), Darwin showed improvement over Father under identical conditions. Full GPQA Diamond 198-question evaluation with Majority Voting is scheduled.
63
+
64
+ ---
65
+
66
+ ## Model Specifications
67
+
68
+ | | |
69
+ |---|---|
70
+ | Architecture | Gemma 4 Dense (Hybrid Attention: Sliding Window + Global) |
71
+ | Total Parameters | 31B |
72
+ | Precision | BF16 |
73
+ | Context Length | 256,072 |
74
+ | Languages | 140+ |
75
+ | Thinking | enable_thinking=True chain-of-thought reasoning |
76
+ | License | Apache 2.0 |
77
+
78
+ ---
79
+
80
+ ## How Darwin V6 Merges
81
+
82
+ Darwin V6 does not use any external merge library such as mergekit. It re-implements the DARE-TIES algorithm (Yadav et al., 2023) directly via PyTorch tensor operations, with per-tensor diagnostic ratios as the key differentiator.
83
+
84
+ Before merging, Darwin performs an MRI diagnostic on both parent models. For every tensor, it measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy). Additionally, 5 probing prompts (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model to measure each layer's functional importance via cosine distance when that layer is skipped.
85
+
86
+ The final merge ratio for each tensor is determined by:
87
+
88
+ ```
89
+ static_score = entropy * 0.3 + std * 0.2 + clamp(norm, 100) * 0.002
90
+ probe_score = sum(cosine_distance[probe_i] * weight_i)
91
+ combined = static * 0.4 + probe * 0.6
92
+ mri_ratio = combined_b / (combined_a + combined_b)
93
+ final_ratio = mri_ratio * mri_trust + genome_ratio * (1 - mri_trust)
94
+ ```
95
+
96
+ mri_trust itself is optimized by the CMA-ES evolutionary algorithm. When the ratio is extreme (< 0.15 or > 0.85), the tensor is transplanted entirely from one parent without interpolation, preventing noise injection.
97
+
98
+ After merging, a Health Check compares the child model against both parents layer by layer, automatically detecting interference or function loss.
99
+
100
+ ---
101
 
102
  ## Evolution Result
 
 
 
103
 
104
+ | | |
105
+ |---|---|
106
+ | ARC-Challenge Best Score | 0.8289 |
107
+ | Merge Method | DARE-TIES (direct PyTorch implementation) |
108
+ | Tensors Merged | 1,188 |
109
+ | Health Check | healthy |
110
+ | Phase 2 Steps | 4 (early stop, patience=5) |
111
+ | Total Time | 134 min |
112
+ | Infrastructure | 4 x NVIDIA H100 NVL (100GB) |
113
+
114
+ Optimal genome (14-dimensional adaptive):
115
 
 
116
  ```
117
+ global_ratio: 0.5147 (overall merge ratio)
118
+ attn_ratio: 0.3169 (Attention layers)
119
+ ffn_ratio: 0.9316 (FFN layers — Mother dominant)
120
+ embed_ratio: 0.7748 (Embedding)
121
+ density_a: 0.8997 (Father DARE density)
122
+ density_b: 0.9539 (Mother DARE density)
123
+ block_0_ratio: 0.6628 (L0-L9)
124
+ block_1_ratio: 0.6431 (L10-L19)
125
+ block_2_ratio: 0.5146 (L20-L29)
126
+ block_3_ratio: 0.5971 (L30-L39)
127
+ block_4_ratio: 0.6339 (L40-L49)
128
+ block_5_ratio: 0.8583 (L50-L59 — reasoning core, Mother dominant)
129
+ mri_trust: 0.3631 (MRI 36% + Genome 64%)
130
+ merge_method_weight: 0.6897
131
+ ```
132
+
133
+ Notable: ffn_ratio=0.93 indicates FFN layers strongly favor the Mother (Claude Opus Distill), and block_5 (L50-L59) at 0.86 also favors the Mother. This is consistent with the MRI heatmap pattern showing that the Mother's reasoning capabilities are concentrated in the later layers.
134
+
135
+ ---
136
+
137
+ ## Usage
138
+
139
+ ### Transformers
140
+
141
+ ```python
142
+ from transformers import AutoTokenizer, AutoModelForCausalLM
143
+ import torch
144
+
145
+ tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-31B-Opus", trust_remote_code=True)
146
+ model = AutoModelForCausalLM.from_pretrained(
147
+ "FINAL-Bench/Darwin-31B-Opus",
148
+ torch_dtype=torch.bfloat16,
149
+ device_map="auto",
150
+ trust_remote_code=True,
151
+ )
152
+
153
+ messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
154
+ text = tokenizer.apply_chat_template(
155
+ messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
156
+ )
157
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
158
+ outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
159
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
160
  ```
161
 
162
+ ---
163
+
164
+ ## VRAM Requirements
165
+
166
+ | Setup | VRAM | Status |
167
+ |---|---|---|
168
+ | BF16 Full Precision | ~62 GB | |
169
+ | NVIDIA H100 80GB | 80 GB | Single GPU |
170
+ | NVIDIA A100 80GB x 2 | 160 GB | Comfortable |
171
+ | NVIDIA RTX 4090 24GB x 4 | 96 GB | Possible (device_map=auto) |
172
+
173
+ ---
174
 
175
+ ## References
 
 
 
176
 
177
+ - DARE-TIES algorithm: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) — re-implemented, not library-dependent
178
+ - Darwin V6 engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
179
+ - FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard
180
+
181
+ ---
182
+
183
+ ## Built By
184
+
185
+ | | |
186
+ |---|---|
187
+ | Developer | VIDRAFT |
188
+ | Engine | Darwin V6 (Diagnostic-Guided Evolutionary Model Merge) |
189
+ | Base Architecture | Gemma-4-31B |
190
+ | License | Apache 2.0 |
191
+
192
+ ---
193
 
194
+ ## Citation
195
 
196
+ ```bibtex
197
+ @misc{vidraft_darwin_31b_opus,
198
+ title = {Darwin-31B-Opus: Diagnostic-Guided Evolutionary Merge on Gemma 4},
199
+ author = {VIDRAFT},
200
+ year = {2026},
201
+ publisher = {Hugging Face},
202
+ howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-31B-Opus}}
203
+ }
204
+ ```