SeaWolf-AI commited on
Commit
d74f2b2
·
verified ·
1 Parent(s): 31022a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -57
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen3.5-9B
 
5
  tags:
6
  - merge
7
  - evolutionary-merge
@@ -57,29 +58,113 @@ model-index:
57
 
58
  # Darwin-9B-Opus
59
 
60
- *"Compact reasoning powerhouse — 9B parameters, graduate-level intelligence."*
61
-
62
  <p align="center">
63
- <img src="info.png" alt="Darwin-35B-A3B-Opus" width="100%">
 
 
 
 
 
64
  </p>
65
 
66
-
67
  <p align="center">
68
- <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="Model"></a>
69
- <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🚀_Space-9B_Live_Demo-purple?style=for-the-badge" alt="Space"></a>
70
- <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B Model"></a>
71
- <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/🚀_Space-35B_Live_Demo-purple?style=for-the-badge" alt="35B Space"></a>
72
- <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/🏆_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
73
- <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/📊_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
74
  </p>
75
-
76
- > **Qwen3.5 Dense 9B** | Reasoning | Chain-of-Thought | 131K Context | 201 Languages | BF16 | Apache 2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ---
79
 
80
  ## Overview
81
 
82
- Darwin-9B-Opus is a **9B dense parameter** reasoning model created using **Darwin V5**, an evolutionary merge engine with Model MRI integration. Built on the Qwen3.5-9B architecture, it inherits structured step-by-step reasoning capabilities through Claude 4.6 Opus distillation while maintaining the full multilingual and long-context capabilities of the base model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
  ---
85
 
@@ -87,7 +172,7 @@ Darwin-9B-Opus is a **9B dense parameter** reasoning model created using **Darwi
87
 
88
  | | |
89
  |---|---|
90
- | Architecture | Qwen3.5 Dense |
91
  | Total Parameters | 9B |
92
  | Precision | BF16 |
93
  | Context Length | 131,072 native |
@@ -102,10 +187,9 @@ Darwin-9B-Opus is a **9B dense parameter** reasoning model created using **Darwi
102
  | Setup | VRAM | Status |
103
  |---|---|---|
104
  | BF16 Full Precision | ~20 GB | |
105
- | NVIDIA A10G 24GB | 24 GB | Comfortable |
106
- | NVIDIA RTX 4090 24GB | 24 GB | Comfortable |
107
- | NVIDIA A100 40GB | 40 GB | Very comfortable |
108
- | NVIDIA T4 16GB | 16 GB | ⚠️ Requires quantization |
109
 
110
  ---
111
 
@@ -128,7 +212,7 @@ model = AutoModelForCausalLM.from_pretrained(
128
  trust_remote_code=True,
129
  )
130
 
131
- messages = [{"role": "user", "content": "Prove that 2 is irrational."}]
132
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
133
  inputs = tokenizer(text, return_tensors="pt").to(model.device)
134
  outputs = model.generate(**inputs, max_new_tokens=4096)
@@ -156,28 +240,26 @@ vllm serve FINAL-Bench/Darwin-9B-Opus \
156
 
157
  ---
158
 
159
- ## What Makes Darwin Special?
160
 
161
- Darwin-9B-Opus was created using **Darwin V5**, an evolutionary merge engine with Model MRI integration.
 
 
 
 
 
 
 
 
162
 
163
- ### Darwin V5 Pipeline
164
 
165
- ```
166
- [Phase 0] Model MRI — Profile both parents layer by layer
167
- ↓ Measure: layer importance, probe cosine distance
168
-
169
- [Phase 1] MRI-Guided Evolution Diagnostic-informed initial genome
170
- ↓ Not random, but "informed by profiling results"
171
-
172
- [Phase 2] mergekit real merge + benchmark fitness selection
173
- ↓ Faster convergence in MRI-narrowed search space
174
-
175
- [Phase 3] MRI Health Check — Profile the child model
176
- ↓ Detect interference, function loss
177
- ↓ Prescribe layer-specific ratio adjustments
178
-
179
- [Final] Darwin-9B-Opus
180
- ```
181
 
182
  ---
183
 
@@ -185,35 +267,20 @@ Darwin-9B-Opus was created using **Darwin V5**, an evolutionary merge engine wit
185
 
186
  | | |
187
  |---|---|
188
- | Developer | **VIDRAFT** |
189
- | Engine | Darwin V5 (Evolutionary Merge + Model MRI) |
190
- | Merge Backend | mergekit (DARE-TIES) |
191
  | Base Architecture | Qwen3.5-9B |
192
 
193
  ---
194
 
195
- ## Acknowledgements
196
-
197
- - **Korean Government** — GPU Support Program research grant
198
- - [Qwen Team](https://huggingface.co/Qwen) — Qwen3.5 base architecture
199
- - [mergekit](https://github.com/arcee-ai/mergekit) — Merge backend infrastructure
200
-
201
- ---
202
-
203
  ## Citation
204
 
205
  ```bibtex
206
  @misc{vidraft_darwin_9b_opus,
207
- title = {Darwin-9B-Opus: Compact Reasoning Model via Diagnostic-Guided Evolutionary Merge},
208
  author = {VIDRAFT},
209
  year = {2026},
210
  publisher = {Hugging Face},
211
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}}
212
  }
213
- ```
214
-
215
- ---
216
-
217
- ## Contact
218
-
219
- 📧 **kkms1116@koreacu.ac.kr**
 
2
  license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen3.5-9B
5
+ - Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled
6
  tags:
7
  - merge
8
  - evolutionary-merge
 
58
 
59
  # Darwin-9B-Opus
60
 
 
 
61
  <p align="center">
62
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="Model"></a>
63
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/Space-9B_Live_Demo-purple?style=for-the-badge" alt="Space"></a>
64
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B Model"></a>
65
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/Space-35B_Live_Demo-purple?style=for-the-badge" alt="35B Space"></a>
66
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
67
+ <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
68
  </p>
69
 
 
70
  <p align="center">
71
+ <img src="info.png" alt="Darwin-9B-Opus" width="100%">
 
 
 
 
 
72
  </p>
73
+
74
+ > Qwen3.5 Dense 9B | Reasoning | Chain-of-Thought | 131K Context | 201 Languages | BF16 | Apache 2.0
75
+
76
+ ---
77
+
78
+ ## Technical Definitions
79
+
80
+ | Term | Definition | Measurement |
81
+ |---|---|---|
82
+ | Model MRI | Layer-level profiling of tensor health indicators | L2 norm, Shannon entropy, std per tensor across all layers |
83
+ | LayerMRI.compare_layers | Per-tensor A vs B quality comparison yielding optimal ratio_b | score = entropy * 0.5 + std * 0.3 + clamp(norm, 100) * 0.002 per model; ratio_b = score_b / (score_a + score_b) |
84
+ | MRI-Guided Merge | Per-tensor merge ratios derived from parent diagnostics (70% MRI + 30% genome) | final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3 |
85
+ | DARE-TIES | Merge algorithm: random binary mask on delta, then weighted addition | merged = A + (B - A) * random_mask(density) * ratio |
86
+ | Transplant A / B | When MRI ratio falls below 0.05 or above 0.95, one parent is used entirely | No interpolation — direct tensor copy |
87
+ | Evolutionary Search | CMA-ES population evolution over genome space (ratio, attn, ffn, embed, density_a, density_b) | Phase 1: 200 steps heuristic proxy, Phase 2: 10 steps real benchmark |
88
 
89
  ---
90
 
91
  ## Overview
92
 
93
+ Darwin-9B-Opus is a 9B dense parameter reasoning model created using Darwin V5. Both parent models share the identical Qwen3.5-9B architecture the Mother is a LoRA SFT on the same base, not a different architecture.
94
+
95
+ | Role | Model | Training |
96
+ |---|---|---|
97
+ | Father | [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) | Original pre-training + RLHF |
98
+ | Mother | [Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled](https://huggingface.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled) | LoRA SFT with text-only Claude 4.6 Opus reasoning chains |
99
+
100
+ ---
101
+
102
+ ## How Darwin V5 Works
103
+
104
+ Darwin V5 does not use mergekit or any external merge library. It implements DARE-TIES merge directly via PyTorch tensor operations, with MRI-guided per-layer ratios. The algorithm is inspired by the DARE-TIES method but re-implemented from scratch to support per-tensor diagnostic-guided ratios.
105
+
106
+ ### Merge Implementation (actual code logic)
107
+
108
+ ```python
109
+ # For each tensor pair (A, B) across all safetensor shards:
110
+ ta = model_a[key] # Father tensor
111
+ tb = model_b[key] # Mother tensor
112
+
113
+ # 1. MRI diagnoses both tensors
114
+ diag_a = LayerMRI.diagnose_tensor(ta) # {norm, entropy, std}
115
+ diag_b = LayerMRI.diagnose_tensor(tb) # {norm, entropy, std}
116
+
117
+ # 2. Quality score comparison determines ratio_b
118
+ score_a = diag_a["entropy"] * 0.5 + diag_a["std"] * 0.3 + min(diag_a["norm"], 100) * 0.002
119
+ score_b = diag_b["entropy"] * 0.5 + diag_b["std"] * 0.3 + min(diag_b["norm"], 100) * 0.002
120
+ mri_ratio = score_b / (score_a + score_b) # Higher = Mother is better
121
+
122
+ # 3. Final ratio = MRI 70% + evolutionary genome 30%
123
+ final_ratio = mri_ratio * 0.7 + genome_type_ratio * 0.3
124
+
125
+ # 4. DARE-TIES merge with per-tensor ratio
126
+ mask = torch.rand_like(tb) < density_b
127
+ delta = (tb - ta) * mask
128
+ merged = (ta + delta * final_ratio).bfloat16()
129
+ ```
130
+
131
+ ### Pipeline
132
+
133
+ ```
134
+ Phase 0: Model MRI
135
+ For every tensor in both parents, measure:
136
+ - L2 norm (layer energy)
137
+ - Shannon entropy (weight distribution uniformity)
138
+ - Standard deviation (activation spread)
139
+ Compare A vs B quality scores -> per-tensor ratio prescription
140
+
141
+ Phase 1: Evolutionary Search (200 steps, heuristic proxy)
142
+ Population of 20 genomes (ratio, attn, ffn, embed, density_a, density_b)
143
+ Fitness: heuristic score based on genome balance + differentiation
144
+ Selection -> SLERP crossover -> Gaussian mutation
145
+
146
+ Phase 2: Real Merge + Benchmark (10 steps)
147
+ Top genomes from Phase 1 undergo actual tensor merge
148
+ Each merge: MRI prescription (70%) + genome ratio (30%)
149
+ Fitness: real benchmark score (ARC-Challenge)
150
+ Best model selected and auto-uploaded
151
+
152
+ Phase 3: Health Check
153
+ Layer-by-layer importance comparison: child vs both parents
154
+ Detect interference (child >> parents) or function loss (parents >> child)
155
+ ```
156
+
157
+ ### What Makes This Different from Standard Merging
158
+
159
+ | Capability | Standard DARE-TIES | Darwin V5 |
160
+ |---|---|---|
161
+ | Implementation | mergekit library call | Direct PyTorch tensor operations |
162
+ | Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MRI diagnosis |
163
+ | Pre-merge analysis | None | Tensor-level norm/entropy/std profiling |
164
+ | Ratio determination | Human-set or grid search | MRI 70% + evolutionary genome 30% |
165
+ | Post-merge validation | Benchmark score only | Layer-by-layer child vs parents comparison |
166
+ | Transplant support | No | ratio < 0.05 -> use A entirely, ratio > 0.95 -> use B entirely |
167
+ | Failure diagnosis | "Score went down" | Per-tensor quality delta identifies problematic layers |
168
 
169
  ---
170
 
 
172
 
173
  | | |
174
  |---|---|
175
+ | Architecture | Qwen3.5 Dense (Gated DeltaNet hybrid) |
176
  | Total Parameters | 9B |
177
  | Precision | BF16 |
178
  | Context Length | 131,072 native |
 
187
  | Setup | VRAM | Status |
188
  |---|---|---|
189
  | BF16 Full Precision | ~20 GB | |
190
+ | NVIDIA RTX 4090 24GB | 24 GB | Comfortable |
191
+ | NVIDIA A100 40GB | 40 GB | Very comfortable |
192
+ | NVIDIA T4 16GB | 16 GB | Requires quantization |
 
193
 
194
  ---
195
 
 
212
  trust_remote_code=True,
213
  )
214
 
215
+ messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
216
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
217
  inputs = tokenizer(text, return_tensors="pt").to(model.device)
218
  outputs = model.generate(**inputs, max_new_tokens=4096)
 
240
 
241
  ---
242
 
243
+ ## Evolution Details
244
 
245
+ | | |
246
+ |---|---|
247
+ | Engine | Darwin V5 (Evolutionary Merge + Layer-Level Diagnostics) |
248
+ | Merge Method | DARE-TIES (direct PyTorch implementation, no external library) |
249
+ | MRI Integration | Per-tensor diagnosis: norm, entropy, std -> ratio prescription |
250
+ | Ratio Formula | final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3 |
251
+ | Evolution | Phase 1: 200 steps proxy + Phase 2: 10 steps real benchmark |
252
+ | Best Score | 0.8508 (ARC-Challenge) |
253
+ | Infrastructure | 4 x NVIDIA H100 NVL (100GB each) |
254
 
255
+ ---
256
 
257
+ ## Acknowledgements
258
+
259
+ - Korean Government GPU Support Program research grant
260
+ - [Qwen Team](https://huggingface.co/Qwen) — Qwen3.5 base architecture
261
+ - [Jackrong](https://huggingface.co/Jackrong) Claude 4.6 Opus Reasoning Distilled model
262
+ - DARE-TIES algorithm [Yadav et al., 2023](https://arxiv.org/abs/2311.03099) (re-implemented, not library-dependent)
 
 
 
 
 
 
 
 
 
 
263
 
264
  ---
265
 
 
267
 
268
  | | |
269
  |---|---|
270
+ | Developer | VIDRAFT |
271
+ | Engine | Darwin V5 |
 
272
  | Base Architecture | Qwen3.5-9B |
273
 
274
  ---
275
 
 
 
 
 
 
 
 
 
276
  ## Citation
277
 
278
  ```bibtex
279
  @misc{vidraft_darwin_9b_opus,
280
+ title = {Darwin-9B-Opus: Diagnostic-Guided Evolutionary Merge},
281
  author = {VIDRAFT},
282
  year = {2026},
283
  publisher = {Hugging Face},
284
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}}
285
  }
286
+ ```