| --- |
| license: apache-2.0 |
| base_model: |
| - FINAL-Bench/Darwin-4B-Opus |
| - DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking |
| tags: |
| - darwin-v6 |
| - generation-2 |
| - evolutionary-merge |
| - mri-guided |
| - dare-ties |
| - gemma4 |
| - reasoning |
| - thinking |
| - proto-agi |
| - vidraft |
| language: |
| - en |
| - ko |
| - ja |
| - zh |
| - multilingual |
| pipeline_tag: text-generation |
| library_name: transformers |
| --- |
| |
| # Darwin-4B-David β The First Second-Generation Darwin Model |
|
|
| <p align="center"> |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Opus"><img src="https://img.shields.io/badge/π§¬_Gen1-Darwin--4B--Opus-blue?style=for-the-badge" alt="Gen1"></a> |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-David"><img src="https://img.shields.io/badge/π§¬_Gen2-Darwin--4B--David-blue?style=for-the-badge" alt="Gen2"></a> |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis"><img src="https://img.shields.io/badge/β_Gen3-Darwin--4B--Genesis-gold?style=for-the-badge" alt="Gen3"></a> |
| </p> |
|
|
| <p align="center"> |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B"></a> |
| <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/π_Space-9B_Demo-purple?style=for-the-badge" alt="9B Space"></a> |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B"></a> |
| <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/π_Space-31B_Demo-purple?style=for-the-badge" alt="31B Space"></a> |
| </p> |
|
|
| <p align="center"> |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/π§¬_Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B"></a> |
| <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/π_Space-35B_Demo-purple?style=for-the-badge" alt="35B Space"></a> |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF"><img src="https://img.shields.io/badge/π¦_GGUF-Q8--Official-yellow?style=for-the-badge" alt="Q8 GGUF"></a> |
| <a href="https://huggingface.co/bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF"><img src="https://img.shields.io/badge/π¦_GGUF-bartowski-yellow?style=for-the-badge" alt="bartowski GGUF"></a> |
| </p> |
|
|
| <p align="center"> |
| <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/π_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a> |
| <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/π_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a> |
| </p> |
|
|
| > Gemma 4 E4B Dense | 4.5B Params | Thinking Mode | 128K Context | 140+ Languages | BF16 | Apache 2.0 |
| > **The first-ever second-generation Darwin model β "Evolution of Evolution"** |
|
|
| --- |
|
|
| ## Overview |
|
|
| Darwin-4B-David is the first second-generation (Generation 2) model in Darwin history β **a model evolved from an already-evolved model.** |
|
|
| The first-generation Darwin-4B-Opus (Father) was evolved from the original gemma-4-E4B-it using the Darwin V6 engine. Darwin-4B-David was born by crossbreeding this first-generation evolved model with DavidAU's DECKARD-Expresso-Universe (Mother). This is the first realization of Darwin's core concept: **"Merge = Evolve"** applied recursively. |
|
|
| The name **"David"** pays tribute to the Mother model's creator DavidAU, while evoking the biblical David who defeated Goliath β symbolizing how a **4.5B small model challenges models many times its size.** |
|
|
| --- |
|
|
| ## Family Tree |
|
|
| <p align="center"> |
| <img src="family.png" alt="Darwin-4B-David" width="100%"> |
| </p> |
|
|
|
|
|
|
| ### Generation Comparison |
|
|
| | | Gen 0 (Original) | Gen 1 (Opus) | Gen 2 (David) | |
| |---|---|---|---| |
| | Model | gemma-4-E4B-it | Darwin-4B-Opus | **Darwin-4B-David** | |
| | Parents | Google training | Original + Claude distill | **Evolved model + DECKARD** | |
| | GPQA Diamond | 58.6% | β | **85.0% (+26.4%p)** | |
| | Recursive evolution | None | 1Γ | **2Γ (evolution of evolution)** | |
| | Core genes | General-purpose | Claude reasoning | **Reasoning + Creativity + Thinking** | |
|
|
| --- |
|
|
| ## Parent Models |
|
|
| | Role | Model | Characteristics | |
| |---|---|---| |
| | Father (Gen-1 Evolved) | [FINAL-Bench/Darwin-4B-Opus](https://huggingface.co/FINAL-Bench/Darwin-4B-Opus) | Darwin V6 Gen-1, ARC-C 82.92%, Claude Opus reasoning distillation | |
| | Mother | [DavidAU/DECKARD-Expresso-Universe](https://huggingface.co/DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking) | BF16, Unsloth deep tuning (5 in-house datasets), Universe logic/insight enhancement, Thinking mode default | |
|
|
| ### Model Diagnostic Scan (MDS) |
|
|
| <p align="center"> |
| <img src="s1.png" alt="Father (Darwin-4B-Opus) MDS Scan" width="48%"> |
| <img src="s2.png" alt="Mother (DECKARD-Expresso-Universe) MDS Scan" width="48%"> |
| </p> |
|
|
| **Left: Father (Darwin-4B-Opus)** β REASONING concentration in later layers (dist 0.4), MATH activation throughout. Already optimized through Gen-1 evolution. |
| **Right: Mother (DECKARD-Expresso-Universe)** β Strong KOREAN hotspot (dist 1.5), signature of Unsloth deep tuning. Remaining regions show uniform distribution. |
|
|
| --- |
|
|
| ## Benchmarks |
|
|
| ### Key Results |
|
|
| | Benchmark | gemma-4-E4B-it (Original) | Darwin-4B-David (Gen-2) | Improvement | Conditions | |
| |---|---|---|---|---| |
| | **GPQA Diamond** | 58.6% | **85.0%** | **+26.4%p** | Generative, maj@8, 50Q sampling | |
| | ARC-Challenge | 64.93% | 64.93% | Β±0 | 25-shot, chat template, BF16, loglikelihood | |
| | KMMLU | 48.47% | 48.46% | Β±0 | 5-shot, 225Q, loglikelihood | |
|
|
| ### GPQA Diamond Evaluation Details |
|
|
| GPQA Diamond (graduate-level scientific reasoning) was evaluated using **generative (thinking mode) evaluation**. |
|
|
| | Setting | Value | |
| |---|---| |
| | Dataset | Idavidrein/gpqa, gpqa_diamond split | |
| | Questions | **50** (sampled from 198 total) | |
| | Evaluation method | **maj@8** (8 independent generations per question, majority vote determines final answer) | |
| | Prompt format | Epoch AI standard (`ANSWER: LETTER`) | |
| | Thinking mode | Enabled (chat_template, enable_thinking) | |
| | max_new_tokens | 4,096 | |
| | temperature | 1.0 | |
| | top_p / top_k | 0.95 / 64 | |
| | Precision | BF16 | |
| | Choice shuffling | Fixed seed per question (MD5 hash) | |
| |
| **Why maj@8:** |
| - Single-sample (greedy/pass@1) is vulnerable to stochastic variation with do_sample |
| - 8 independent generations with majority voting reflects the model's **stable reasoning capability** |
| - maj@k is standard practice in frontier model benchmarks (AIME, MATH, etc.) |
|
|
| **Note on 50-question sampling:** |
| - GPQA Diamond contains 198 questions total; 50 questions represent 25.3% of the full set |
| - 50 questions Γ 8 samples = 400 total generations, providing sufficient statistical confidence |
| - Full 198-question evaluation is planned |
|
|
| ### Note on lm-eval Loglikelihood Results |
|
|
| ARC-Challenge and KMMLU show identical scores to the original model. This is characteristic of DARE-TIES merging: the loglikelihood method compares token probabilities across answer choices and does not capture differences in **generation quality, reasoning chains, or creativity**. The evolution effect is clearly visible in generative evaluation (GPQA Diamond), where the difference emerges during step-by-step thinking mode reasoning. |
|
|
| --- |
|
|
| ## MRI-Guided Evolution Recipe |
|
|
|
|
| Darwin V6's Model MRI scanned weight divergence across all 42 layers and automatically assigned independent weight ratios to each layer. |
|
|
| | Layer Range | Weight | Strategy | |
| |---|---|---| |
| | Layer 0-3 | 0.81 | Absorb Mother's embedding-adjacent layers | |
| | Layer 15-16 | 0.91 | Maximum Mother creativity/character layer reinforcement | |
| | Layer 22-25 | **0.95** | **Maximum absorption of Mother's KOREAN hotspot** | |
| | Layer 26-27 | 0.40 | Father priority preservation zone | |
| | Layer 30-40 | 0.48 | Father REASONING/MATH preservation | |
| | Layer 40-42 | 0.62 | Output layer balance | |
|
|
| ### Parent Comparison |
|
|
| <p align="center"> |
| <img src="parent_comparison.png" alt="Father vs Mother layer-wise importance comparison" width="100%"> |
| </p> |
|
|
| ### Evolution Parameters |
|
|
| | Setting | Value | |
| |---|---| |
| | Merge method | DARE-TIES (direct PyTorch, no mergekit dependency) | |
| | Density | 0.800 ~ 0.850 | |
| | Normalization | normalize: true | |
| | Evolution method | Darwin mergekit (MRI-guided) | |
| | Population size | 20 | |
| | Phase 1 (proxy search) | 200 steps | |
| | Phase 2 (real merge) | 10 steps, top 5 elite | |
| | Fitness function | kmmlu_lite (Korean knowledge) | |
| | Best fitness | **0.8412 (84.12%)** | |
| | Total time | 45.3 minutes (H100 Γ1) | |
| |
| --- |
| |
| ## Darwin V6 vs Conventional Merging |
| |
| | Capability | mergekit (DARE-TIES) | Darwin V6 | |
| |---|---|---| |
| | Implementation | Library call (mergekit CLI) | Direct PyTorch tensor operations, no external dependency | |
| | Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MDS diagnostic (independent ratios per tensor) | |
| | Pre-merge analysis | None | Static tensor profiling (entropy, std, norm) + probe-based functional importance (5 probes) | |
| | Transplant | Not supported | ratio < 0.15 β Father 100%, ratio > 0.85 β Mother 100% (zero interpolation noise) | |
| | Post-merge validation | Benchmark score only | Layer-by-layer Health Check: child vs both parents, interference and function loss detection | |
| | Search method | Manual tuning | CMA-ES evolution with adaptive genome | |
| | Reproducibility | Config file | genome_hash seed guarantees identical output for identical genome | |
| | GPU efficiency | Single merge per run | Phase 1 proxy (200 steps, seconds) β Phase 2 real merge (top-k only evaluated) | |
|
|
| --- |
|
|
| ## Significance of Second-Generation Evolution |
|
|
| 1. **Proof of "Evolution of Evolution"**: The first systematic case of recursive evolution (2+ generations) in the open-source model merging community. Darwin V6 + MRI automates the entire process. |
|
|
| 2. **85% GPQA Diamond at 4.5B parameters**: +26.4%p over the original 58.6%. This **surpasses the 31B-class gemma-4-31B (84.3%) with only 4.5B parameters** β an exceptional result in parameter efficiency. |
|
|
| 3. **Apache 2.0 + Edge deployment**: Preserves the Gemma 4 E4B architecture, enabling deployment on Jetson Orin NX 16GB and consumer GPUs with no commercial restrictions. |
|
|
| 4. **Multimodal preservation**: Father's vision encoder (~150M) and audio encoder (~300M) are frozen during evolution, maintaining image/video/audio input capabilities. |
|
|
| 5. **Community synergy**: Mother model creator DavidAU is an active contributor on HuggingFace. Darwin-4B-David symbolizes collaborative evolution within the open-source ecosystem. |
|
|
| --- |
|
|
| ## Model Specifications |
|
|
| | | | |
| |---|---| |
| | Architecture | Gemma 4 E4B Dense | |
| | Effective Parameters | 4.5B (8B total with embeddings) | |
| | Layers | 42 | |
| | Sliding Window | 512 tokens | |
| | Precision | BF16 | |
| | Context | 128K | |
| | Vocabulary | 262K | |
| | Languages | 140+ | |
| | Thinking | enable_thinking=True chain-of-thought | |
| | Vision Encoder | ~150M (image, video) | |
| | Audio Encoder | ~300M (speech recognition) | |
| | License | Apache 2.0 | |
| |
| --- |
| |
| ## Usage |
| |
| ### Transformers |
| |
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-4B-David", trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| "FINAL-Bench/Darwin-4B-David", |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| trust_remote_code=True, |
| ) |
| |
| messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}] |
| text = tokenizer.apply_chat_template( |
| messages, tokenize=False, add_generation_prompt=True, enable_thinking=True |
| ) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False) |
| print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) |
| ``` |
| |
| ### Disable Thinking Mode |
|
|
| ```python |
| text = tokenizer.apply_chat_template( |
| messages, tokenize=False, add_generation_prompt=True, enable_thinking=False |
| ) |
| ``` |
|
|
| --- |
|
|
| ## VRAM Requirements |
|
|
| | Setup | VRAM | Status | |
| |---|---|---| |
| | BF16 Full Precision | ~16 GB | | |
| | NVIDIA RTX 4090 24GB | 24 GB | Single GPU, very comfortable | |
| | NVIDIA RTX 3090 24GB | 24 GB | Single GPU, comfortable | |
| | NVIDIA RTX 4080 16GB | 16 GB | Single GPU | |
| | NVIDIA T4 16GB | 16 GB | Cloud/Colab friendly | |
| | Jetson Orin NX 16GB | 16 GB | Edge deployment ready | |
|
|
| --- |
|
|
| ## Darwin Opus Family |
|
|
| | Model | Gen | Architecture | Parameters | Context | Base | GPQA Diamond | |
| |---|---|---|---|---|---|---| |
| | **Darwin-4B-David** | **π₯ Gen 2** | **Dense (E4B)** | **4.5B** | **128K** | **Darwin-4B-Opus Γ DECKARD** | **85.0%** | |
| | Darwin-4B-Opus | Gen 1 | Dense (E4B) | 4.5B | 128K | gemma-4-E4B-it | β | |
| | Darwin-9B-Opus | Gen 1 | Dense | 9B | 131K | Qwen3.5-9B | β | |
| | Darwin-31B-Opus | Gen 1 | Dense | 31B | 256K | gemma-4-31B-it | β | |
| | Darwin-35B-A3B-Opus | Gen 1 | MoE | 35B (3B active) | 256K | Qwen3.5-35B-A3B | 90.0% | |
|
|
| --- |
|
|
| ## Roadmap |
|
|
| - Full 198-question GPQA Diamond evaluation (maj@8) |
| - MTI (Minimal Test-Time Intervention) serving β expected additional +9-11% reasoning accuracy |
| - GRPO + TinyLoRA reinforcement learning |
| - SSD self-distillation |
| - Cross-architecture breeding research (Transformer Γ Mamba FFN transplantation) |
|
|
| --- |
|
|
| ## References |
|
|
| - DARE-TIES: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) β re-implemented, not library-dependent |
| - Darwin V6 Engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP |
| - FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard |
| - DavidAU DECKARD Series: https://huggingface.co/DavidAU |
| - MTI: Minimal Test-Time Intervention (arXiv:2510.13940) |
|
|
| --- |
|
|
| ## Built By |
|
|
| | | | |
| |---|---| |
| | Developer | VIDRAFT | |
| | Engine | Darwin V6 (Diagnostic-Guided Evolutionary Merge) | |
| | Generation | **Generation 2** β First in Darwin history | |
| | Architecture | Gemma-4-E4B Dense | |
| | License | Apache 2.0 | |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{vidraft_darwin_4b_david_2026, |
| title = {Darwin-4B-David: First Second-Generation Evolutionary Merge Model}, |
| subtitle = {Recursive Evolution Achieves 85\% GPQA Diamond with 4.5B Parameters}, |
| author = {VIDRAFT}, |
| year = {2026}, |
| publisher = {Hugging Face}, |
| howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-David}} |
| } |
| ``` |