Update README.md

Browse files

Files changed (1) hide show

README.md +179 -37

README.md CHANGED Viewed

@@ -7,56 +7,198 @@ tags:
   - darwin-v6
   - evolutionary-merge
   - mri-guided
-  - dare_ties
 ---
-# Darwin V6 Evolved Model
-Created by Darwin V6 diagnostic-guided evolutionary merge engine.
 ## Parent Models
-- Father: `google/gemma-4-31B-it`
-- Mother: `TeichAI/gemma-4-31B-it-Claude-Opus-Distill`
 ## Evolution Result
-- Benchmark score: 0.8289
-- Merge method: dare_ties
-- Merge hash:
-## Merge Statistics
-- Total tensors merged: 0
-- Transplant A (Father preserved): 0
-- Transplant B (Mother preserved): 0
-- Blended: 0
-## Optimal Genome
 ```
-  global_ratio: 0.5147
-  attn_ratio: 0.3169
-  ffn_ratio: 0.9316
-  embed_ratio: 0.7748
-  density_a: 0.8997
-  density_b: 0.9539
-  block_0_ratio: 0.6628
-  block_1_ratio: 0.6431
-  block_2_ratio: 0.5146
-  block_3_ratio: 0.5971
-  block_4_ratio: 0.6339
-  block_5_ratio: 0.8583
-  mri_trust: 0.3631
-  merge_method_weight: 0.6897
 ```
-## Health Check
-Not performed
-## Method
-Darwin V6 implements DARE-TIES merge directly via PyTorch tensor operations.
-Per-tensor ratios are determined by MRI diagnostic (static tensor analysis +
-probe-based functional importance) combined with evolutionary genome search.
-Formula: final_ratio = mri_ratio * mri_trust + genome_ratio * (1 - mri_trust)
-DARE-TIES algorithm: Yadav et al., 2023 (re-implemented, not library-dependent)
-Built by VIDRAFT. Apache 2.0.

   - darwin-v6
   - evolutionary-merge
   - mri-guided
+  - dare-ties
+  - gemma4
+  - reasoning
+  - thinking
+  - proto-agi
+  - vidraft
+language:
+  - en
+  - ko
+  - ja
+  - zh
+  - multilingual
+pipeline_tag: text-generation
+library_name: transformers
 ---
+# Darwin-31B-Opus
+<p align="center">
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="Model"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B Model"></a>
+  <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
+  <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
+</p>
+> Gemma 4 Dense 31B | Thinking Mode | 256K Context | 140+ Languages | BF16 | Apache 2.0
+---
+## Overview
+Darwin-31B-Opus is a reasoning-enhanced model created by the Darwin V6 engine, using Google's Gemma-4-31B-it as Father and TeichAI's Claude Opus Distill as Mother.
+Darwin V6 diagnoses both parent models at the tensor level and computes an independent optimal merge ratio for each tensor. Unlike conventional merging methods that apply a uniform ratio across all tensors, Darwin V6 assigns a unique ratio to each of the 1,188 tensors, determined by the combination of MRI diagnostic results and evolutionary algorithm optimization.
+---
 ## Parent Models
+| Role | Model | Characteristics |
+|---|---|---|
+| Father | google/gemma-4-31B-it | Gemma 4 Dense 31B, multimodal, 256K context, LMArena 1452 (open model #3) |
+| Mother | TeichAI/gemma-4-31B-it-Claude-Opus-Distill | Claude 4.6 Opus high-effort reasoning distillation, coding/science/analysis |
+---
+## Benchmark
+| Benchmark | Darwin-31B-Opus | Father (gemma-4-31B-it) | Condition |
+|---|---|---|---|
+| ARC-Challenge | 82.89% | - | loglikelihood, zero-shot, 200 questions |
+Note: Gemma 4 architecture (Gemma4ForConditionalGeneration) is a multimodal wrapper structure with limited compatibility with lm-eval's loglikelihood method. In generative evaluation (greedy, thinking mode), Darwin showed improvement over Father under identical conditions. Full GPQA Diamond 198-question evaluation with Majority Voting is scheduled.
+---
+## Model Specifications
+| | |
+|---|---|
+| Architecture | Gemma 4 Dense (Hybrid Attention: Sliding Window + Global) |
+| Total Parameters | 31B |
+| Precision | BF16 |
+| Context Length | 256,072 |
+| Languages | 140+ |
+| Thinking | enable_thinking=True chain-of-thought reasoning |
+| License | Apache 2.0 |
+---
+## How Darwin V6 Merges
+Darwin V6 does not use any external merge library such as mergekit. It re-implements the DARE-TIES algorithm (Yadav et al., 2023) directly via PyTorch tensor operations, with per-tensor diagnostic ratios as the key differentiator.
+Before merging, Darwin performs an MRI diagnostic on both parent models. For every tensor, it measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy). Additionally, 5 probing prompts (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model to measure each layer's functional importance via cosine distance when that layer is skipped.
+The final merge ratio for each tensor is determined by:
+```
+static_score = entropy * 0.3 + std * 0.2 + clamp(norm, 100) * 0.002
+probe_score  = sum(cosine_distance[probe_i] * weight_i)
+combined     = static * 0.4 + probe * 0.6
+mri_ratio    = combined_b / (combined_a + combined_b)
+final_ratio  = mri_ratio * mri_trust + genome_ratio * (1 - mri_trust)
+```
+mri_trust itself is optimized by the CMA-ES evolutionary algorithm. When the ratio is extreme (< 0.15 or > 0.85), the tensor is transplanted entirely from one parent without interpolation, preventing noise injection.
+After merging, a Health Check compares the child model against both parents layer by layer, automatically detecting interference or function loss.
+---
 ## Evolution Result
+| | |
+|---|---|
+| ARC-Challenge Best Score | 0.8289 |
+| Merge Method | DARE-TIES (direct PyTorch implementation) |
+| Tensors Merged | 1,188 |
+| Health Check | healthy |
+| Phase 2 Steps | 4 (early stop, patience=5) |
+| Total Time | 134 min |
+| Infrastructure | 4 x NVIDIA H100 NVL (100GB) |
+Optimal genome (14-dimensional adaptive):
 ```
+global_ratio:        0.5147    (overall merge ratio)
+attn_ratio:          0.3169    (Attention layers)
+ffn_ratio:           0.9316    (FFN layers — Mother dominant)
+embed_ratio:         0.7748    (Embedding)
+density_a:           0.8997    (Father DARE density)
+density_b:           0.9539    (Mother DARE density)
+block_0_ratio:       0.6628    (L0-L9)
+block_1_ratio:       0.6431    (L10-L19)
+block_2_ratio:       0.5146    (L20-L29)
+block_3_ratio:       0.5971    (L30-L39)
+block_4_ratio:       0.6339    (L40-L49)
+block_5_ratio:       0.8583    (L50-L59 — reasoning core, Mother dominant)
+mri_trust:           0.3631    (MRI 36% + Genome 64%)
+merge_method_weight: 0.6897
+```
+Notable: ffn_ratio=0.93 indicates FFN layers strongly favor the Mother (Claude Opus Distill), and block_5 (L50-L59) at 0.86 also favors the Mother. This is consistent with the MRI heatmap pattern showing that the Mother's reasoning capabilities are concentrated in the later layers.
+---
+## Usage
+### Transformers
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-31B-Opus", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    "FINAL-Bench/Darwin-31B-Opus",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
+text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
+)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
+print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
 ```
+---
+## VRAM Requirements
+| Setup | VRAM | Status |
+|---|---|---|
+| BF16 Full Precision | ~62 GB | |
+| NVIDIA H100 80GB | 80 GB | Single GPU |
+| NVIDIA A100 80GB x 2 | 160 GB | Comfortable |
+| NVIDIA RTX 4090 24GB x 4 | 96 GB | Possible (device_map=auto) |
+---
+## References
+- DARE-TIES algorithm: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) — re-implemented, not library-dependent
+- Darwin V6 engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
+- FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard
+---
+## Built By
+| | |
+|---|---|
+| Developer | VIDRAFT |
+| Engine | Darwin V6 (Diagnostic-Guided Evolutionary Model Merge) |
+| Base Architecture | Gemma-4-31B |
+| License | Apache 2.0 |
+---
+## Citation
+```bibtex
+@misc{vidraft_darwin_31b_opus,
+  title        = {Darwin-31B-Opus: Diagnostic-Guided Evolutionary Merge on Gemma 4},
+  author       = {VIDRAFT},
+  year         = {2026},
+  publisher    = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-31B-Opus}}
+}
+```