SeaWolf-AI commited on
Commit
8976b73
·
verified ·
1 Parent(s): 7b18ba7

Rename to Darwin-218B-Delphi + add GPQA Diamond 90.91% public results

Browse files
Files changed (1) hide show
  1. README.md +155 -9
README.md CHANGED
@@ -8,10 +8,9 @@ pipeline_tag: text-generation
8
  tags:
9
  - darwin
10
  - vidraft
11
- - expert
12
  - chemistry
13
  - korean
14
- - kr
15
  - moe
16
  - mixture-of-experts
17
  - cohere2_moe
@@ -21,13 +20,160 @@ base_model:
21
  - FINAL-Bench/Darwin-218B-kr
22
  ---
23
 
24
- # Darwin-218B-Expert-Chem
25
 
26
- VIDraft Darwin Expert 시리즈 — **화학 특화** 218B Mixture-of-Experts 모델.
27
 
28
- ## 구성
29
- - 베이스: [Darwin-218B-kr](https://huggingface.co/FINAL-Bench/Darwin-218B-kr) (한국어 SFT 적용된 Command A+)
30
- - 원본 베이스: [CohereLabs/command-a-plus-05-2026-bf16](https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16) (218B total / ~25B active, cohere2_moe, 128 expert) — Apache-2.0
31
- - 화학 특화: Opus 증류 데이터(대학원급 화학 추론 + 단계별 CoT)로 LoRA 학습 후 병합
32
 
33
- 내부 릴리스 (private).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  tags:
9
  - darwin
10
  - vidraft
11
+ - delphi
12
  - chemistry
13
  - korean
 
14
  - moe
15
  - mixture-of-experts
16
  - cohere2_moe
 
20
  - FINAL-Bench/Darwin-218B-kr
21
  ---
22
 
23
+ # Darwin-218B-Delphi
24
 
25
+ > **VIDRAFT FINAL-Bench**chemistry-specialized 218B MoE, served via the **DELPHI** 5-Phase inference cascade.
26
 
27
+ A chemistry-domain derivative of the Darwin-218B family. Built on the Korean-aligned base, distilled from a strong teacher with anti-contamination guarantees, and engineered for graduate-level scientific reasoning.
 
 
 
28
 
29
+ ---
30
+
31
+ ## 🏆 GPQA Diamond — Public Results
32
+
33
+ ```
34
+ GPQA Diamond (198 questions) — Darwin-218B-Delphi
35
+ ─────────────────────────────────────────────────────────
36
+ Method | Accuracy
37
+ ─────────────────────────────────────────────────────────
38
+ MAJ@8 (standard inference scaling) | 90.40% (179/198)
39
+ + DELPHI cascade (VIDRAFT signature) | 90.91% (180/198)
40
+ ─────────────────────────────────────────────────────────
41
+ DELPHI contribution | +0.51pp (+1 question via self-critique)
42
+ ```
43
+
44
+ ### Reference baselines (vendor-reported)
45
+
46
+ | Model | GPQA Diamond | Mode |
47
+ |------|-------------|------|
48
+ | GPT-5 (OpenAI) | 88.0% | thinking |
49
+ | Claude Opus 4.5 (Anthropic) | 91.8% | extended thinking |
50
+ | DeepSeek-V3.2 | ~78-82% | standard |
51
+ | **Darwin-218B-Delphi (MAJ@8)** | **90.40%** | **standard MAJ@8** |
52
+ | **Darwin-218B-Delphi (+DELPHI)** | **90.91%** | **VIDRAFT signature** |
53
+
54
+ → **MAJ@8 단독으로 GPT-5 thinking 능가**, **DELPHI cascade로 Claude Opus 4.5 extended thinking 동급권** 진입.
55
+
56
+ ---
57
+
58
+ ## Lineage
59
+
60
+ ```
61
+ CohereLabs/command-a-plus-05-2026-bf16 (Apache-2.0 base, 218B MoE, ~25B active, 128 expert)
62
+ ↓ Korean LoRA merge
63
+ Darwin-218B-kr (Korean-aligned base)
64
+ ↓ Chemistry SFT LoRA merge (Opus-distilled, anti-contamination)
65
+ Darwin-218B-Delphi ← THIS MODEL
66
+ ```
67
+
68
+ **Distillation**:
69
+ - Teacher: large frontier model (proprietary API; no logits exposure → SFT-on-outputs pattern)
70
+ - 993 high-quality chemistry CoT examples across 6 sub-domains:
71
+ organic, spectroscopy, physical, inorganic, analytical, special
72
+ - **Anti-contamination**: GPQA Diamond 198 questions guaranteed not in training data
73
+ - LoRA: r=16, α=32, q/k/v/o, lr=1e-5, 1 epoch, max_length=3072
74
+ - Trained on Darwin-218B-kr (S4 6×B200 bf16)
75
+ - Merge: full dense checkpoint, no runtime adapter loading
76
+
77
+ ---
78
+
79
+ ## Architecture
80
+
81
+ | Item | Value |
82
+ |------|-------|
83
+ | Total parameters | 218B |
84
+ | Active parameters | ~25B (MoE) |
85
+ | Experts | 128 (Cohere2 MoE) |
86
+ | Precision | BF16 |
87
+ | Architecture | `Cohere2VisionForConditionalGeneration` (multimodal-capable, text-primary) |
88
+ | Tokenizer | Cohere2 (vocab 256K) |
89
+ | Languages | English, Korean |
90
+ | Context | 65,536 tokens |
91
+ | License | Apache-2.0 |
92
+
93
+ ---
94
+
95
+ ## DELPHI 5-Phase Cascade (signature inference mode)
96
+
97
+ The VIDRAFT DELPHI cascade routes each question through 5 progressively deeper inference stages:
98
+
99
+ 1. **P1** — greedy single-shot (temperature 0)
100
+ 2. **P2** — MAJ@8 majority vote (temperature 0.7)
101
+ 3. **P3** — 16-vote tiebreak for close calls
102
+ 4. **P4** — Multi-Turn Inference (MTI): 3-turn self-critique × 8 chains
103
+ 5. **P5** — weighted global tiebreak across all phases
104
+
105
+ Compute-optimal: most questions resolve at P1/P2; only ambiguous ones escalate.
106
+
107
+ ---
108
+
109
+ ## Usage
110
+
111
+ ### vLLM (recommended)
112
+
113
+ ```bash
114
+ vllm serve FINAL-Bench/Darwin-218B-Delphi \
115
+ --tensor-parallel-size 8 \
116
+ --dtype bfloat16 \
117
+ --max-model-len 65536 \
118
+ --trust-remote-code \
119
+ --enforce-eager \
120
+ --limit-mm-per-prompt '{"image":0,"video":0}'
121
+ ```
122
+
123
+ Requires vLLM ≥ 0.21.0 (`Cohere2VisionForConditionalGeneration` support).
124
+
125
+ ### Transformers
126
+
127
+ ```python
128
+ from transformers import AutoModelForCausalLM, AutoTokenizer
129
+ import torch
130
+
131
+ model = AutoModelForCausalLM.from_pretrained(
132
+ "FINAL-Bench/Darwin-218B-Delphi",
133
+ dtype=torch.bfloat16,
134
+ device_map="auto",
135
+ trust_remote_code=True,
136
+ )
137
+ tok = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-218B-Delphi")
138
+
139
+ messages = [
140
+ {"role": "user", "content": "Explain the SN2 mechanism step by step, "
141
+ "then justify why CH3I reacts faster than CH3Cl."}
142
+ ]
143
+ prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
144
+ inputs = tok(prompt, return_tensors="pt").to(model.device)
145
+ out = model.generate(**inputs, max_new_tokens=2048, temperature=0.3, top_p=0.9)
146
+ print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
147
+ ```
148
+
149
+ ---
150
+
151
+ ## License
152
+
153
+ **Apache License 2.0**
154
+
155
+ Built upon `CohereLabs/command-a-plus-05-2026-bf16` (Apache-2.0) and `Darwin-218B-kr` (Apache-2.0). All upstream components are permissively licensed.
156
+
157
+ ---
158
+
159
+ ## Contributors
160
+
161
+ **Lead Architect & Developer** — 장재원 (Jaewon Jang), CTO, VIDRAFT
162
+ *Domain SFT distillation pipeline, DELPHI cascade design, model integration.*
163
+
164
+ **Organization** — VIDRAFT / FINAL-Bench
165
+ https://huggingface.co/FINAL-Bench
166
+
167
+ ---
168
+
169
+ ## Citation
170
+
171
+ ```bibtex
172
+ @misc{darwin-218b-delphi-2026,
173
+ title = {Darwin-218B-Delphi: Chemistry-Specialized 218B MoE with DELPHI Cascade Inference},
174
+ author = {Jang, Jaewon and {VIDRAFT FINAL-Bench Team}},
175
+ year = {2026},
176
+ publisher = {Hugging Face},
177
+ howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-218B-Delphi}}
178
+ }
179
+ ```