README.md · FINAL-Bench/Darwin-9B-Opus at main

File size: 9,780 Bytes

---
license: apache-2.0
base_model:
  - Qwen/Qwen3.5-9B
  - Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled
tags:
  - merge
  - evolutionary-merge
  - darwin
  - darwin-v5
  - model-mri
  - reasoning
  - advanced-reasoning
  - chain-of-thought
  - thinking
  - qwen3.5
  - qwen
  - claude-opus
  - distillation
  - multilingual
  - benchmark
  - open-source
  - apache-2.0
  - layer-wise-merge
  - coding-agent
  - tool-calling
  - long-context
language:
  - en
  - zh
  - ko
  - ja
  - de
  - fr
  - es
  - ru
  - ar
  - multilingual
pipeline_tag: text-generation
library_name: transformers
model-index:
  - name: Darwin-9B-Opus
    results:
      - task:
          type: text-generation
          name: Graduate-Level Reasoning
        dataset:
          type: Idavidrein/gpqa
          name: GPQA Diamond
          config: gpqa_diamond
          split: train
        metrics:
          - type: accuracy
            value: 90.0
            name: Accuracy
            verified: false
---

# Darwin-9B-Opus

<p align="center">
  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="Model"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/Space-9B_Live_Demo-purple?style=for-the-badge" alt="Space"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B Model"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/Space-35B_Live_Demo-purple?style=for-the-badge" alt="35B Space"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
</p>

<p align="center">
  <img src="info.png" alt="Darwin-9B-Opus" width="100%">
</p>

> Qwen3.5 Dense 9B | Reasoning | Chain-of-Thought | 131K Context | 201 Languages | BF16 | Apache 2.0

---

## Technical Definitions

| Term | Definition | Measurement |
|---|---|---|
| Model MRI | Layer-level profiling of tensor health indicators | L2 norm, Shannon entropy, std per tensor across all layers |
| LayerMRI.compare_layers | Per-tensor A vs B quality comparison yielding optimal ratio_b | score = entropy * 0.5 + std * 0.3 + clamp(norm, 100) * 0.002 per model; ratio_b = score_b / (score_a + score_b) |
| MRI-Guided Merge | Per-tensor merge ratios derived from parent diagnostics (70% MRI + 30% genome) | final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3 |
| DARE-TIES | Merge algorithm: random binary mask on delta, then weighted addition | merged = A + (B - A) * random_mask(density) * ratio |
| Transplant A / B | When MRI ratio falls below 0.05 or above 0.95, one parent is used entirely | No interpolation — direct tensor copy |
| Evolutionary Search | CMA-ES population evolution over genome space (ratio, attn, ffn, embed, density_a, density_b) | Phase 1: 200 steps heuristic proxy, Phase 2: 10 steps real benchmark |

---

## Overview

Darwin-9B-Opus is a 9B dense parameter reasoning model created using Darwin V5. Both parent models share the identical Qwen3.5-9B architecture — the Mother is a LoRA SFT on the same base, not a different architecture.

| Role | Model | Training |
|---|---|---|
| Father | [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) | Original pre-training + RLHF |
| Mother | [Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled](https://huggingface.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled) | LoRA SFT with text-only Claude 4.6 Opus reasoning chains |

---

## How Darwin V5 Works

Darwin V5 does not use mergekit or any external merge library. It implements DARE-TIES merge directly via PyTorch tensor operations, with MRI-guided per-layer ratios. The algorithm is inspired by the DARE-TIES method but re-implemented from scratch to support per-tensor diagnostic-guided ratios.

### Merge Implementation (actual code logic)

```python
# For each tensor pair (A, B) across all safetensor shards:
ta = model_a[key]       # Father tensor
tb = model_b[key]       # Mother tensor

# 1. MRI diagnoses both tensors
diag_a = LayerMRI.diagnose_tensor(ta)  # {norm, entropy, std}
diag_b = LayerMRI.diagnose_tensor(tb)  # {norm, entropy, std}

# 2. Quality score comparison determines ratio_b
score_a = diag_a["entropy"] * 0.5 + diag_a["std"] * 0.3 + min(diag_a["norm"], 100) * 0.002
score_b = diag_b["entropy"] * 0.5 + diag_b["std"] * 0.3 + min(diag_b["norm"], 100) * 0.002
mri_ratio = score_b / (score_a + score_b)  # Higher = Mother is better

# 3. Final ratio = MRI 70% + evolutionary genome 30%
final_ratio = mri_ratio * 0.7 + genome_type_ratio * 0.3

# 4. DARE-TIES merge with per-tensor ratio
mask = torch.rand_like(tb) < density_b
delta = (tb - ta) * mask
merged = (ta + delta * final_ratio).bfloat16()
```

### Pipeline

```
Phase 0: Model MRI
  For every tensor in both parents, measure:
    - L2 norm (layer energy)
    - Shannon entropy (weight distribution uniformity)
    - Standard deviation (activation spread)
  Compare A vs B quality scores -> per-tensor ratio prescription

Phase 1: Evolutionary Search (200 steps, heuristic proxy)
  Population of 20 genomes (ratio, attn, ffn, embed, density_a, density_b)
  Fitness: heuristic score based on genome balance + differentiation
  Selection -> SLERP crossover -> Gaussian mutation

Phase 2: Real Merge + Benchmark (10 steps)
  Top genomes from Phase 1 undergo actual tensor merge
  Each merge: MRI prescription (70%) + genome ratio (30%)
  Fitness: real benchmark score (ARC-Challenge)
  Best model selected and auto-uploaded

Phase 3: Health Check
  Layer-by-layer importance comparison: child vs both parents
  Detect interference (child >> parents) or function loss (parents >> child)
```

### What Makes This Different from Standard Merging

| Capability | Standard DARE-TIES | Darwin V5 |
|---|---|---|
| Implementation | mergekit library call | Direct PyTorch tensor operations |
| Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MRI diagnosis |
| Pre-merge analysis | None | Tensor-level norm/entropy/std profiling |
| Ratio determination | Human-set or grid search | MRI 70% + evolutionary genome 30% |
| Post-merge validation | Benchmark score only | Layer-by-layer child vs parents comparison |
| Transplant support | No | ratio < 0.05 -> use A entirely, ratio > 0.95 -> use B entirely |
| Failure diagnosis | "Score went down" | Per-tensor quality delta identifies problematic layers |

---

## Model Specifications

| | |
|---|---|
| Architecture | Qwen3.5 Dense (Gated DeltaNet hybrid) |
| Total Parameters | 9B |
| Precision | BF16 |
| Context Length | 131,072 native |
| Languages | 201 |
| Thinking | `<think>` tag chain-of-thought reasoning |
| License | Apache 2.0 |

---

## Hardware Requirements

| Setup | VRAM | Status |
|---|---|---|
| BF16 Full Precision | ~20 GB | |
| NVIDIA RTX 4090 24GB | 24 GB | Comfortable |
| NVIDIA A100 40GB | 40 GB | Very comfortable |
| NVIDIA T4 16GB | 16 GB | Requires quantization |

---

## Usage

### Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "FINAL-Bench/Darwin-9B-Opus",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-9B-Opus",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
```

### SGLang

```bash
python -m sglang.launch_server \
  --model-path FINAL-Bench/Darwin-9B-Opus \
  --tp 1 \
  --mem-fraction-static 0.90 \
  --context-length 32768 \
  --trust-remote-code
```

### vLLM

```bash
vllm serve FINAL-Bench/Darwin-9B-Opus \
  --trust-remote-code \
  --enforce-eager
```

---

## Evolution Details

| | |
|---|---|
| Engine | Darwin V5 (Evolutionary Merge + Layer-Level Diagnostics) |
| Merge Method | DARE-TIES (direct PyTorch implementation, no external library) |
| MRI Integration | Per-tensor diagnosis: norm, entropy, std -> ratio prescription |
| Ratio Formula | final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3 |
| Evolution | Phase 1: 200 steps proxy + Phase 2: 10 steps real benchmark |
| Best Score | 0.8508 (ARC-Challenge) |
| Infrastructure | 4 x NVIDIA H100 NVL (100GB each) |

---

## Acknowledgements

- Korean Government — GPU Support Program research grant
- [Qwen Team](https://huggingface.co/Qwen) — Qwen3.5 base architecture
- [Jackrong](https://huggingface.co/Jackrong) — Claude 4.6 Opus Reasoning Distilled model
- DARE-TIES algorithm — [Yadav et al., 2023](https://arxiv.org/abs/2311.03099) (re-implemented, not library-dependent)

---

## Built By

| | |
|---|---|
| Developer | VIDRAFT |
| Engine | Darwin V5 |
| Base Architecture | Qwen3.5-9B |

---

## Citation

```bibtex
@misc{vidraft_darwin_9b_opus,
  title        = {Darwin-9B-Opus: Diagnostic-Guided Evolutionary Merge},
  author       = {VIDRAFT},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}}
}
```