SeaWolf-AI commited on
Commit
46b34c4
Β·
verified Β·
1 Parent(s): 8a3f4e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +246 -32
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Gemma-4 Multichat
3
  emoji: πŸ‘€
4
  colorFrom: blue
5
  colorTo: yellow
@@ -8,38 +8,252 @@ sdk_version: 6.10.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Gemma 4 β€” MoE 26B or Dense 31B, Vision, Thinking
12
  hf_oauth: true
13
  hf_oauth_scopes:
14
  - email
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
-
17
- πŸ’Ž Gemma 4 Playground β€” Dual Model Demo on ZeroGPU
18
- We just launched a Gemma 4 Playground that lets you chat with Google DeepMind's latest open models β€” directly on Hugging Face Spaces with ZeroGPU.
19
- πŸ‘‰ Try it now: FINAL-Bench/Gemma-4-Multi
20
- Two Models, One Space
21
- Switch between both Gemma 4 variants in a single interface:
22
-
23
- ⚑ Gemma 4 26B-A4B β€” MoE with 128 experts, only 3.8B active params. 95% of the 31B's quality at ~8x faster inference. AIME 88.3%, GPQA 82.3%.
24
- πŸ† Gemma 4 31B β€” Dense 30.7B. Best quality among Gemma 4 family. AIME 89.2%, GPQA 84.3%, Codeforces 2150. Arena open-model top 3.
25
-
26
- Features
27
-
28
- Vision β€” Upload images for analysis, OCR, chart reading, document parsing
29
- Thinking Mode β€” Toggle chain-of-thought reasoning with Gemma 4's native <|channel> thinking tokens
30
- System Prompts β€” 6 presets (General, Code, Math, Creative, Translate, Research) or write your own
31
- Streaming β€” Real-time token-by-token response via ZeroGPU
32
- Apache 2.0 β€” Fully open, no restrictions
33
-
34
- Technical Details
35
- Built with the dev build of transformers (5.5.0.dev0) for full Gemma 4 support including multimodal apply_chat_template, variable-resolution image processing, and native thinking mode. Runs on HF ZeroGPU with @spaces.GPU β€” no dedicated GPU needed.
36
- Both models support 256K context window and 140+ languages out of the box.
37
-
38
- ### Links
39
-
40
- - πŸ€— Space: [FINAL-Bench/Gemma-4-Multi](https://huggingface.co/spaces/FINAL-Bench/Gemma-4-Multi)
41
- - πŸ“„ Gemma 4 26B-A4B: [google/gemma-4-26B-A4B-it](https://huggingface.co/google/gemma-4-26B-A4B-it)
42
- - πŸ“„ Gemma 4 31B: [google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it)
43
- - πŸ”¬ DeepMind Blog: [Gemma 4 Launch](https://deepmind.google/blog/gemma-4-byte-for-byte-the-most-capable-open-models/)
44
-
45
- Built by VIDRAFT 🧬
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Darwin-31B-Opus
3
  emoji: πŸ‘€
4
  colorFrom: blue
5
  colorTo: yellow
 
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ short_description: gemma-4-31B-it + gemma-4-31B-it-Claude-Opus-Distill
12
  hf_oauth: true
13
  hf_oauth_scopes:
14
  - email
15
+ base_model:
16
+ - google/gemma-4-31B-it
17
+ - TeichAI/gemma-4-31B-it-Claude-Opus-Distill
18
+ tags:
19
+ - darwin-v6
20
+ - evolutionary-merge
21
+ - mri-guided
22
+ - dare-ties
23
+ - gemma4
24
+ - reasoning
25
+ - thinking
26
+ - proto-agi
27
+ - vidraft
28
+ language:
29
+ - en
30
+ - ko
31
+ - ja
32
+ - zh
33
+ - multilingual
34
+ pipeline_tag: text-generation
35
+ library_name: transformers
36
  ---
37
+
38
+ # Darwin-31B-Opus
39
+
40
+ <p align="center">
41
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B Model"></a>
42
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--35B--A3B--Opus-blue?style=for-the-badge" alt="35B Model"></a>
43
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B Model"></a>
44
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus-Q8-GGUF"><img src="https://img.shields.io/badge/πŸ“¦_GGUF-Q8--Official-yellow?style=for-the-badge" alt="Q8 GGUF"></a>
45
+ <a href="https://huggingface.co/bartowski/FINAL-Bench_Darwin-35B-A3B-Opus-GGUF"><img src="https://img.shields.io/badge/πŸ“¦_GGUF-bartowski-yellow?style=for-the-badge" alt="bartowski GGUF"></a>
46
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/πŸš€_Space-9B_Demo-purple?style=for-the-badge" alt="9B Space"></a>
47
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Darwin-35B-A3B-Opus"><img src="https://img.shields.io/badge/πŸš€_Space-35B_Demo-purple?style=for-the-badge" alt="35B Space"></a>
48
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/πŸ†_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
49
+ <a href="https://huggingface.co/spaces/FINAL-Bench/all-bench-leaderboard"><img src="https://img.shields.io/badge/πŸ“Š_ALL_Bench-Leaderboard-orange?style=for-the-badge" alt="ALL Bench"></a>
50
+ </p>
51
+
52
+ <p align="center">
53
+ <img src="info.png" alt="Darwin-31B-Opus" width="100%">
54
+ </p>
55
+
56
+ > Gemma 4 Dense 31B | Thinking Mode | 256K Context | 140+ Languages | BF16 | Apache 2.0
57
+
58
+ ---
59
+
60
+ ## Overview
61
+
62
+ Darwin-31B-Opus is a reasoning-enhanced model created by merging google/gemma-4-31B-it (Father) and TeichAI/gemma-4-31B-it-Claude-Opus-Distill (Mother) using the Darwin V6 engine.
63
+
64
+ Darwin V6 diagnoses both parent models at the tensor level before merging, assigning an independent optimal ratio to each of the 1,188 tensors. This is fundamentally different from conventional merging tools that apply a single uniform ratio across all tensors.
65
+
66
+ ---
67
+
68
+ ## Parent Models
69
+
70
+ | Role | Model | Characteristics |
71
+ |---|---|---|
72
+ | Father | google/gemma-4-31B-it | Gemma 4 Dense 31B, multimodal, 256K context, LMArena 1452 (open model #3) |
73
+ | Mother | TeichAI/gemma-4-31B-it-Claude-Opus-Distill | Claude 4.6 Opus high-effort reasoning distillation, code/science/analysis |
74
+
75
+ ### Model Diagnostic Scan (MDS)
76
+
77
+ <p align="center">
78
+ <img src="s1.png" alt="Father (gemma-4-31B-it) MDS Scan" width="48%">
79
+ <img src="s2.png" alt="Mother (Claude-Opus-Distill) MDS Scan" width="48%">
80
+ </p>
81
+
82
+ Left: Father (gemma-4-31B-it) β€” balanced generalist with low activation across most probes. Right: Mother (Claude-Opus-Distill) β€” strong REASONING concentration in L50-L60, CODE activation in late layers, KOREAN at start and end. The Mother shows significantly more specialized layer patterns from Claude Opus distillation.
83
+
84
+ ---
85
+
86
+ ## Benchmarks
87
+
88
+ | Benchmark | Darwin-31B-Opus | Father (gemma-4-31B-it) | Condition |
89
+ |---|---|---|---|
90
+ | ARC-Challenge | 82.89% | - | loglikelihood, zero-shot, 200Q |
91
+ | GPQA Diamond | 66.0% | 60.0% | generative thinking mode, greedy, 50Q |
92
+
93
+ GPQA Diamond was evaluated under identical conditions for both models: same 50 questions, same seed (i+42), same prompt template, greedy decoding (do_sample=False), max_new_tokens=2048, enable_thinking=True. Darwin-31B-Opus achieved a 10% relative improvement over the Father model.
94
+
95
+ Note: Gemma 4 architecture (Gemma4ForConditionalGeneration) has limited compatibility with lm-eval's loglikelihood method due to its multimodal wrapper structure. Only generative evaluation produces valid results for Gemma 4 based models. Full 198-question evaluation with Majority Voting is planned.
96
+
97
+ ---
98
+
99
+ ## Darwin V6 vs Conventional Merging
100
+
101
+ | Capability | mergekit (DARE-TIES) | Darwin V6 |
102
+ |---|---|---|
103
+ | Implementation | Library call (mergekit CLI) | Direct PyTorch tensor operations, no external dependency |
104
+ | Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MDS diagnostic (1,188 independent ratios) |
105
+ | Pre-merge analysis | None | Static tensor profiling (entropy, std, norm) + probe-based functional importance (5 probes) |
106
+ | Ratio formula | Human-set or grid search | combined = static Γ— 0.4 + probe Γ— 0.6, then evolutionary optimization |
107
+ | Transplant | Not supported | ratio < 0.15 β†’ Father 100%, ratio > 0.85 β†’ Mother 100% (zero interpolation noise) |
108
+ | Post-merge validation | Benchmark score only | Layer-by-layer Health Check: child vs both parents, interference and function loss detection |
109
+ | Search method | Manual tuning | CMA-ES evolution with adaptive 14-dimensional genome |
110
+ | Reproducibility | Config file | genome_hash seed guarantees identical output for identical genome |
111
+ | GPU efficiency | Single merge per run | Phase 1 proxy (200 steps, seconds) β†’ Phase 2 real merge (top-k only evaluated) |
112
+
113
+ ---
114
+
115
+ ## How Darwin V6 Works
116
+
117
+ Darwin V6 does not use mergekit or any external merge library. It re-implements DARE-TIES (Yadav et al., 2023) directly via PyTorch tensor operations with per-tensor diagnostic ratios.
118
+
119
+ Before merging, Darwin performs a Model Diagnostic Scan (MDS) on both parents. For every tensor, it measures Shannon entropy (information density), standard deviation (activation spread), and L2 norm (energy). Additionally, 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) are passed through the model, measuring cosine distance when each layer is skipped to determine functional importance.
120
+
121
+ The final merge ratio for each tensor:
122
+
123
+ ```
124
+ static_score = entropy Γ— 0.3 + std Γ— 0.2 + clamp(norm, 100) Γ— 0.002
125
+ probe_score = Ξ£(cosine_distance[probe_i] Γ— weight_i)
126
+ combined = static Γ— 0.4 + probe Γ— 0.6
127
+ mri_ratio = combined_b / (combined_a + combined_b)
128
+ final_ratio = mri_ratio Γ— mri_trust + genome_ratio Γ— (1 - mri_trust)
129
+ ```
130
+
131
+ The mri_trust parameter itself is optimized by the CMA-ES evolutionary algorithm, allowing the system to automatically determine the optimal balance between diagnostic prescription and evolutionary search for each model pair.
132
+
133
+ After merging, a Health Check compares the child model against both parents layer-by-layer, detecting interference (child importance >> parent max) or function loss (parent importance high but child dropped).
134
+
135
+ ### Parent Comparison (MDS Result)
136
+
137
+ <p align="center">
138
+ <img src="parent_comparison.png" alt="Parent Comparison β€” Layer-wise Importance" width="100%">
139
+ </p>
140
+
141
+ ---
142
+
143
+ ## Evolution Result
144
+
145
+ | | |
146
+ |---|---|
147
+ | Best Score (ARC-Challenge) | 0.8289 |
148
+ | Merge Method | DARE-TIES (direct PyTorch) |
149
+ | Tensors Merged | 1,188 |
150
+ | Health Check | healthy |
151
+ | Phase 2 Steps | 4 (early stop, patience=5) |
152
+ | Total Time | 134 min |
153
+ | Infrastructure | 4 x NVIDIA H100 NVL (100GB) |
154
+
155
+ Optimal Genome (14-dimensional adaptive):
156
+
157
+ ```
158
+ global_ratio: 0.5147 (overall merge ratio)
159
+ attn_ratio: 0.3169 (Attention layers β€” Father dominant)
160
+ ffn_ratio: 0.9316 (FFN layers β€” Mother dominant)
161
+ embed_ratio: 0.7748 (Embedding)
162
+ density_a: 0.8997 (Father DARE density)
163
+ density_b: 0.9539 (Mother DARE density)
164
+ block_0_ratio: 0.6628 (L0-L9)
165
+ block_1_ratio: 0.6431 (L10-L19)
166
+ block_2_ratio: 0.5146 (L20-L29, balanced)
167
+ block_3_ratio: 0.5971 (L30-L39)
168
+ block_4_ratio: 0.6339 (L40-L49)
169
+ block_5_ratio: 0.8583 (L50-L59, reasoning core β€” Mother dominant)
170
+ mri_trust: 0.3631 (MDS 36% + Genome 64%)
171
+ merge_method_weight: 0.6897
172
+ ```
173
+
174
+ Key observations from the genome: ffn_ratio=0.93 indicates the FFN layers strongly favor the Mother (Claude Opus Distill), and block_5 (L50-L59)=0.86 shows the reasoning core layers also favor Mother. This aligns with the MDS heatmap pattern where Mother's reasoning capability concentrated in the final layers. Meanwhile, attn_ratio=0.32 preserves Father's attention structure, maintaining the original Gemma 4 multimodal and long-context capabilities.
175
+
176
+ ---
177
+
178
+ ## Model Specifications
179
+
180
+ | | |
181
+ |---|---|
182
+ | Architecture | Gemma 4 Dense (Hybrid Attention: Sliding Window + Global) |
183
+ | Parameters | 31B |
184
+ | Precision | BF16 |
185
+ | Context | 256,072 |
186
+ | Languages | 140+ |
187
+ | Thinking | enable_thinking=True chain-of-thought |
188
+ | License | Apache 2.0 |
189
+
190
+ ---
191
+
192
+ ## Usage
193
+
194
+ ### Transformers
195
+
196
+ ```python
197
+ from transformers import AutoTokenizer, AutoModelForCausalLM
198
+ import torch
199
+
200
+ tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-31B-Opus", trust_remote_code=True)
201
+ model = AutoModelForCausalLM.from_pretrained(
202
+ "FINAL-Bench/Darwin-31B-Opus",
203
+ torch_dtype=torch.bfloat16,
204
+ device_map="auto",
205
+ trust_remote_code=True,
206
+ )
207
+
208
+ messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
209
+ text = tokenizer.apply_chat_template(
210
+ messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
211
+ )
212
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
213
+ outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
214
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
215
+ ```
216
+
217
+ ---
218
+
219
+ ## VRAM Requirements
220
+
221
+ | Setup | VRAM | Status |
222
+ |---|---|---|
223
+ | BF16 Full Precision | ~62 GB | |
224
+ | NVIDIA H100 80GB | 80 GB | Single GPU |
225
+ | NVIDIA A100 80GB x 2 | 160 GB | Comfortable |
226
+ | NVIDIA RTX 4090 24GB x 4 | 96 GB | device_map=auto |
227
+
228
+ ---
229
+
230
+ ## References
231
+
232
+ - DARE-TIES: Yadav et al., 2023 (https://arxiv.org/abs/2311.03099) β€” re-implemented, not library-dependent
233
+ - Darwin V6 Engine: https://huggingface.co/spaces/ginigen-ai/DARWIN-V5-BACKUP
234
+ - FINAL Bench: https://huggingface.co/spaces/FINAL-Bench/Leaderboard
235
+
236
+ ---
237
+
238
+ ## Built By
239
+
240
+ | | |
241
+ |---|---|
242
+ | Developer | VIDRAFT |
243
+ | Engine | Darwin V6 (Diagnostic-Guided Evolutionary Merge) |
244
+ | Architecture | Gemma-4-31B |
245
+ | License | Apache 2.0 |
246
+
247
+ ---
248
+
249
+ ## Citation
250
+
251
+ ```bibtex
252
+ @misc{vidraft_darwin_31b_opus,
253
+ title = {Darwin-31B-Opus: Diagnostic-Guided Evolutionary Merge on Gemma 4},
254
+ author = {VIDRAFT},
255
+ year = {2026},
256
+ publisher = {Hugging Face},
257
+ howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-31B-Opus}}
258
+ }
259
+ ```