Update README.md
Browse files
README.md
CHANGED
|
@@ -7,9 +7,24 @@ tags:
|
|
| 7 |
- darwin-v6
|
| 8 |
- evolutionary-merge
|
| 9 |
- mri-guided
|
| 10 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
|
|
|
|
|
|
| 13 |
<p align="center">
|
| 14 |
<!-- Small Models -->
|
| 15 |
<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--4B--Opus-blue?style=for-the-badge" alt="4B Model"></a>
|
|
@@ -34,53 +49,199 @@ tags:
|
|
| 34 |
<a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/π_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
|
| 35 |
</p>
|
| 36 |
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## Parent Models
|
| 42 |
-
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
## Evolution Result
|
| 46 |
-
- Benchmark score: 0.8292
|
| 47 |
-
- Merge method: dare_ties
|
| 48 |
-
- Merge hash:
|
| 49 |
|
| 50 |
-
|
| 51 |
-
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
|
|
|
|
|
|
| 55 |
|
| 56 |
-
## Optimal Genome
|
| 57 |
```
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
```
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
-
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
- darwin-v6
|
| 8 |
- evolutionary-merge
|
| 9 |
- mri-guided
|
| 10 |
+
- dare-ties
|
| 11 |
+
- gemma4
|
| 12 |
+
- reasoning
|
| 13 |
+
- thinking
|
| 14 |
+
- proto-agi
|
| 15 |
+
- vidraft
|
| 16 |
+
language:
|
| 17 |
+
- en
|
| 18 |
+
- ko
|
| 19 |
+
- ja
|
| 20 |
+
- zh
|
| 21 |
+
- multilingual
|
| 22 |
+
pipeline_tag: text-generation
|
| 23 |
+
library_name: transformers
|
| 24 |
---
|
| 25 |
|
| 26 |
+
# Darwin-4B-Opus
|
| 27 |
+
|
| 28 |
<p align="center">
|
| 29 |
<!-- Small Models -->
|
| 30 |
<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--4B--Opus-blue?style=for-the-badge" alt="4B Model"></a>
|
|
|
|
| 49 |
<a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/π_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
|
| 50 |
</p>
|
| 51 |
|
| 52 |
+
> Gemma 4 Expert 4B (MoE) | Thinking Mode | 128K Context | 140+ Languages | BF16 | Apache 2.0
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## Overview
|
| 57 |
|
| 58 |
+
Darwin-4B-Opus is a reasoning-enhanced model created by merging google/gemma-4-E4B-it (Father) and arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled (Mother) using the Darwin V6 engine.
|
| 59 |
+
|
| 60 |
+
Darwin V6 diagnoses both parent models at the tensor level before merging, assigning an independent optimal ratio to each tensor. This is fundamentally different from conventional merging tools that apply a single uniform ratio across all tensors.
|
| 61 |
+
|
| 62 |
+
As the smallest member of the Darwin Opus family, Darwin-4B-Opus delivers Claude Opus-level reasoning distillation in a highly efficient 4B parameter MoE architecture, making it ideal for edge deployment, rapid prototyping, and resource-constrained environments while maintaining strong benchmark performance (0.8292 ARC-Challenge).
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
|
| 66 |
## Parent Models
|
| 67 |
+
|
| 68 |
+
| Role | Model | Characteristics |
|
| 69 |
+
|---|---|---|
|
| 70 |
+
| Father | google/gemma-4-E4B-it | Gemma 4 Expert 4B (MoE), multimodal, 128K context, efficient inference |
|
| 71 |
+
| Mother | arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled | Claude 4.6 Opus high-effort reasoning distillation, enhanced code/science/analysis |
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## Benchmarks
|
| 76 |
+
|
| 77 |
+
| Benchmark | Darwin-4B-Opus | Condition |
|
| 78 |
+
|---|---|---|
|
| 79 |
+
| ARC-Challenge | 82.92% | loglikelihood, zero-shot |
|
| 80 |
+
|
| 81 |
+
Note: Gemma 4 architecture (Gemma4ForConditionalGeneration) has limited compatibility with lm-eval's loglikelihood method due to its multimodal wrapper structure. Only generative evaluation produces valid results for Gemma 4 based models. Full extended evaluation with Majority Voting is planned.
|
| 82 |
+
|
| 83 |
+
---
|
| 84 |
+
|
| 85 |
+
## Darwin V6 vs Conventional Merging
|
| 86 |
+
|
| 87 |
+
| Capability | mergekit (DARE-TIES) | Darwin V6 |
|
| 88 |
+
|---|---|---|
|
| 89 |
+
| Implementation | Library call (mergekit CLI) | Direct PyTorch tensor operations, no external dependency |
|
| 90 |
+
| Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MDS diagnostic (independent ratios per tensor) |
|
| 91 |
+
| Pre-merge analysis | None | Static tensor profiling (entropy, std, norm) + probe-based functional importance (5 probes) |
|
| 92 |
+
| Ratio formula | Human-set or grid search | combined = static Γ 0.4 + probe Γ 0.6, then evolutionary optimization |
|
| 93 |
+
| Transplant | Not supported | ratio < 0.15 β Father 100%, ratio > 0.85 β Mother 100% (zero interpolation noise) |
|
| 94 |
+
| Post-merge validation | Benchmark score only | Layer-by-layer Health Check: child vs both parents, interference and function loss detection |
|
| 95 |
+
| Search method | Manual tuning | CMA-ES evolution with adaptive 14-dimensional genome |
|
| 96 |
+
| Reproducibility | Config file | genome_hash seed guarantees identical output for identical genome |
|
| 97 |
+
| GPU efficiency | Single merge per run | Phase 1 proxy (200 steps, seconds) β Phase 2 real merge (top-k only evaluated) |
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## How Darwin V6 Works
|
| 102 |
+
|
| 103 |
+
Darwin V6 does not use mergekit or any external merge library. It re-implements DARE-TIES (Yadav et al., 2023) directly via PyTorch tensor operations with per-tensor diagnostic ratios.
|
| 104 |
+
|
| 105 |
+
Before merging, Darwin performs a Model Diagnostic Scan (MDS) on both parents. For every tensor, it measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy). Additionally, 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model, measuring cosine distance when each layer is skipped to determine functional importance.
|
| 106 |
+
|
| 107 |
+
The final merge ratio for each tensor:
|
| 108 |
+
|
| 109 |
+
```
|
| 110 |
+
static_score = entropy Γ 0.3 + std Γ 0.2 + clamp(norm, 100) Γ 0.002
|
| 111 |
+
probe_score = Ξ£(cosine_distance[probe_i] Γ weight_i)
|
| 112 |
+
combined = static Γ 0.4 + probe Γ 0.6
|
| 113 |
+
mri_ratio = combined_b / (combined_a + combined_b)
|
| 114 |
+
final_ratio = mri_ratio Γ mri_trust + genome_ratio Γ (1 - mri_trust)
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
The mri_trust parameter itself is optimized by the CMA-ES evolutionary algorithm, allowing the system to automatically determine the optimal balance between diagnostic prescription and evolutionary search for each model pair.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
|
| 121 |
## Evolution Result
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
+
| | |
|
| 124 |
+
|---|---|
|
| 125 |
+
| Best Score (ARC-Challenge) | 0.8292 |
|
| 126 |
+
| Merge Method | DARE-TIES (direct PyTorch) |
|
| 127 |
+
| Health Check | Not performed |
|
| 128 |
+
|
| 129 |
+
Optimal Genome (14-dimensional adaptive):
|
| 130 |
|
|
|
|
| 131 |
```
|
| 132 |
+
global_ratio: 0.4989 (overall merge ratio β near balanced)
|
| 133 |
+
attn_ratio: 0.1766 (Attention layers β Father strongly dominant)
|
| 134 |
+
ffn_ratio: 0.9021 (FFN layers β Mother strongly dominant)
|
| 135 |
+
embed_ratio: 0.6122 (Embedding β slight Mother bias)
|
| 136 |
+
density_a: 0.9951 (Father DARE density β nearly full)
|
| 137 |
+
density_b: 0.9617 (Mother DARE density β high)
|
| 138 |
+
block_0_ratio: 0.5740 (early layers β slight Mother bias)
|
| 139 |
+
block_1_ratio: 0.5811 (early-mid layers β slight Mother bias)
|
| 140 |
+
block_2_ratio: 0.5736 (mid layers β slight Mother bias)
|
| 141 |
+
block_3_ratio: 0.4697 (mid-late layers β near balanced, slight Father)
|
| 142 |
+
block_4_ratio: 0.4930 (late layers β near balanced)
|
| 143 |
+
block_5_ratio: 0.8418 (final layers, reasoning core β Mother dominant)
|
| 144 |
+
mri_trust: 0.4907 (MDS 49% + Genome 51% β near equal trust)
|
| 145 |
+
merge_method_weight: 0.3623
|
| 146 |
```
|
| 147 |
|
| 148 |
+
Key observations from the genome: ffn_ratio=0.90 indicates the FFN layers strongly favor the Mother (Claude Opus Distill), carrying the bulk of the reasoning enhancement. block_5 (final layers)=0.84 shows the reasoning core layers also strongly favor Mother, consistent with the pattern seen across all Darwin Opus models where Claude's reasoning capability concentrates in the final layers. Meanwhile, attn_ratio=0.18 firmly preserves Father's attention structure, maintaining the original Gemma 4 multimodal and context capabilities. Notably, mri_trust=0.49 shows the system found near-equal value in both diagnostic analysis and evolutionary search, suggesting a well-balanced optimization.
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
## Model Specifications
|
| 153 |
+
|
| 154 |
+
| | |
|
| 155 |
+
|---|---|
|
| 156 |
+
| Architecture | Gemma 4 Expert 4B (Mixture of Experts) |
|
| 157 |
+
| Parameters | 4B |
|
| 158 |
+
| Precision | BF16 |
|
| 159 |
+
| Context | 128K |
|
| 160 |
+
| Languages | 140+ |
|
| 161 |
+
| Thinking | enable_thinking=True chain-of-thought |
|
| 162 |
+
| License | Apache 2.0 |
|
| 163 |
+
|
| 164 |
+
---
|
| 165 |
+
|
| 166 |
+
## Usage
|
| 167 |
+
|
| 168 |
+
### Transformers
|
| 169 |
+
|
| 170 |
+
```python
|
| 171 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 172 |
+
import torch
|
| 173 |
+
|
| 174 |
+
tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-4B-Opus", trust_remote_code=True)
|
| 175 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 176 |
+
"FINAL-Bench/Darwin-4B-Opus",
|
| 177 |
+
torch_dtype=torch.bfloat16,
|
| 178 |
+
device_map="auto",
|
| 179 |
+
trust_remote_code=True,
|
| 180 |
+
)
|
| 181 |
+
|
| 182 |
+
messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
|
| 183 |
+
text = tokenizer.apply_chat_template(
|
| 184 |
+
messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
|
| 185 |
+
)
|
| 186 |
+
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 187 |
+
outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
|
| 188 |
+
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
|
| 189 |
+
```
|
| 190 |
+
|
| 191 |
+
---
|
| 192 |
+
|
| 193 |
+
## VRAM Requirements
|
| 194 |
+
|
| 195 |
+
| Setup | VRAM | Status |
|
| 196 |
+
|---|---|---|
|
| 197 |
+
| BF16 Full Precision | ~8 GB | |
|
| 198 |
+
| NVIDIA RTX 4090 24GB | 24 GB | Single GPU, very comfortable |
|
| 199 |
+
| NVIDIA RTX 3090 24GB | 24 GB | Single GPU, comfortable |
|
| 200 |
+
| NVIDIA RTX 4080 16GB | 16 GB | Single GPU |
|
| 201 |
+
| NVIDIA T4 16GB | 16 GB | Cloud/Colab friendly |
|
| 202 |
+
|
| 203 |
+
Darwin-4B-Opus is the most accessible model in the Darwin Opus family, running comfortably on a single consumer GPU.
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
## Darwin Opus Family
|
| 208 |
|
| 209 |
+
| Model | Architecture | Parameters | Context | Base |
|
| 210 |
+
|---|---|---|---|---|
|
| 211 |
+
| **Darwin-4B-Opus** | MoE (E4B) | 4B | 128K | gemma-4-E4B-it |
|
| 212 |
+
| Darwin-9B-Opus | β | 9B | β | gemma-4-9B-it |
|
| 213 |
+
| Darwin-31B-Opus | Dense | 31B | 256K | gemma-4-31B-it |
|
| 214 |
+
| Darwin-35B-A3B-Opus | MoE | 35B (3B active) | 256K | gemma-4-35B-A3B-it |
|
| 215 |
+
|
| 216 |
+
---
|
| 217 |
|
| 218 |
+
## References
|
| 219 |
+
|
| 220 |
+
- DARE-TIES: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) β re-implemented, not library-dependent
|
| 221 |
+
- Darwin V6 Engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
|
| 222 |
+
- FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard
|
| 223 |
+
|
| 224 |
+
---
|
| 225 |
+
|
| 226 |
+
## Built By
|
| 227 |
+
|
| 228 |
+
| | |
|
| 229 |
+
|---|---|
|
| 230 |
+
| Developer | VIDRAFT |
|
| 231 |
+
| Engine | Darwin V6 (Diagnostic-Guided Evolutionary Merge) |
|
| 232 |
+
| Architecture | Gemma-4-E4B (MoE) |
|
| 233 |
+
| License | Apache 2.0 |
|
| 234 |
+
|
| 235 |
+
---
|
| 236 |
|
| 237 |
+
## Citation
|
| 238 |
|
| 239 |
+
```bibtex
|
| 240 |
+
@misc{vidraft_darwin_4b_opus,
|
| 241 |
+
title = {Darwin-4B-Opus: Diagnostic-Guided Evolutionary Merge on Gemma 4 E4B},
|
| 242 |
+
author = {VIDRAFT},
|
| 243 |
+
year = {2026},
|
| 244 |
+
publisher = {Hugging Face},
|
| 245 |
+
howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-Opus}}
|
| 246 |
+
}
|
| 247 |
+
```
|