Text Generation
Transformers
Safetensors
Korean
English
cohere2_vision
image-text-to-text
darwin
vidraft
delphi
chemistry
korean
Mixture of Experts
mixture-of-experts
cohere2_moe
218b
gpqa-88
conversational
Eval Results (legacy)
Eval Results
Instructions to use FINAL-Bench/Darwin-218B-Delphi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FINAL-Bench/Darwin-218B-Delphi with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-218B-Delphi") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("FINAL-Bench/Darwin-218B-Delphi") model = AutoModelForMultimodalLM.from_pretrained("FINAL-Bench/Darwin-218B-Delphi") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use FINAL-Bench/Darwin-218B-Delphi with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FINAL-Bench/Darwin-218B-Delphi" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-218B-Delphi", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/FINAL-Bench/Darwin-218B-Delphi
- SGLang
How to use FINAL-Bench/Darwin-218B-Delphi with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-218B-Delphi" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-218B-Delphi", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-218B-Delphi" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-218B-Delphi", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use FINAL-Bench/Darwin-218B-Delphi with Docker Model Runner:
docker model run hf.co/FINAL-Bench/Darwin-218B-Delphi
Rename to Darwin-218B-Delphi + add GPQA Diamond 90.91% public results
Browse files
README.md
CHANGED
|
@@ -8,10 +8,9 @@ pipeline_tag: text-generation
|
|
| 8 |
tags:
|
| 9 |
- darwin
|
| 10 |
- vidraft
|
| 11 |
-
-
|
| 12 |
- chemistry
|
| 13 |
- korean
|
| 14 |
-
- kr
|
| 15 |
- moe
|
| 16 |
- mixture-of-experts
|
| 17 |
- cohere2_moe
|
|
@@ -21,13 +20,160 @@ base_model:
|
|
| 21 |
- FINAL-Bench/Darwin-218B-kr
|
| 22 |
---
|
| 23 |
|
| 24 |
-
# Darwin-218B-
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
-
|
| 29 |
-
- 베이스: [Darwin-218B-kr](https://huggingface.co/FINAL-Bench/Darwin-218B-kr) (한국어 SFT 적용된 Command A+)
|
| 30 |
-
- 원본 베이스: [CohereLabs/command-a-plus-05-2026-bf16](https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16) (218B total / ~25B active, cohere2_moe, 128 expert) — Apache-2.0
|
| 31 |
-
- 화학 특화: Opus 증류 데이터(대학원급 화학 추론 + 단계별 CoT)로 LoRA 학습 후 병합
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
tags:
|
| 9 |
- darwin
|
| 10 |
- vidraft
|
| 11 |
+
- delphi
|
| 12 |
- chemistry
|
| 13 |
- korean
|
|
|
|
| 14 |
- moe
|
| 15 |
- mixture-of-experts
|
| 16 |
- cohere2_moe
|
|
|
|
| 20 |
- FINAL-Bench/Darwin-218B-kr
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# Darwin-218B-Delphi
|
| 24 |
|
| 25 |
+
> **VIDRAFT FINAL-Bench** — chemistry-specialized 218B MoE, served via the **DELPHI** 5-Phase inference cascade.
|
| 26 |
|
| 27 |
+
A chemistry-domain derivative of the Darwin-218B family. Built on the Korean-aligned base, distilled from a strong teacher with anti-contamination guarantees, and engineered for graduate-level scientific reasoning.
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## 🏆 GPQA Diamond — Public Results
|
| 32 |
+
|
| 33 |
+
```
|
| 34 |
+
GPQA Diamond (198 questions) — Darwin-218B-Delphi
|
| 35 |
+
─────────────────────────────────────────────────────────
|
| 36 |
+
Method | Accuracy
|
| 37 |
+
─────────────────────────────────────────────────────────
|
| 38 |
+
MAJ@8 (standard inference scaling) | 90.40% (179/198)
|
| 39 |
+
+ DELPHI cascade (VIDRAFT signature) | 90.91% (180/198)
|
| 40 |
+
─────────────────────────────────────────────────────────
|
| 41 |
+
DELPHI contribution | +0.51pp (+1 question via self-critique)
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### Reference baselines (vendor-reported)
|
| 45 |
+
|
| 46 |
+
| Model | GPQA Diamond | Mode |
|
| 47 |
+
|------|-------------|------|
|
| 48 |
+
| GPT-5 (OpenAI) | 88.0% | thinking |
|
| 49 |
+
| Claude Opus 4.5 (Anthropic) | 91.8% | extended thinking |
|
| 50 |
+
| DeepSeek-V3.2 | ~78-82% | standard |
|
| 51 |
+
| **Darwin-218B-Delphi (MAJ@8)** | **90.40%** | **standard MAJ@8** |
|
| 52 |
+
| **Darwin-218B-Delphi (+DELPHI)** | **90.91%** | **VIDRAFT signature** |
|
| 53 |
+
|
| 54 |
+
→ **MAJ@8 단독으로 GPT-5 thinking 능가**, **DELPHI cascade로 Claude Opus 4.5 extended thinking 동급권** 진입.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## Lineage
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
CohereLabs/command-a-plus-05-2026-bf16 (Apache-2.0 base, 218B MoE, ~25B active, 128 expert)
|
| 62 |
+
↓ Korean LoRA merge
|
| 63 |
+
Darwin-218B-kr (Korean-aligned base)
|
| 64 |
+
↓ Chemistry SFT LoRA merge (Opus-distilled, anti-contamination)
|
| 65 |
+
Darwin-218B-Delphi ← THIS MODEL
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
**Distillation**:
|
| 69 |
+
- Teacher: large frontier model (proprietary API; no logits exposure → SFT-on-outputs pattern)
|
| 70 |
+
- 993 high-quality chemistry CoT examples across 6 sub-domains:
|
| 71 |
+
organic, spectroscopy, physical, inorganic, analytical, special
|
| 72 |
+
- **Anti-contamination**: GPQA Diamond 198 questions guaranteed not in training data
|
| 73 |
+
- LoRA: r=16, α=32, q/k/v/o, lr=1e-5, 1 epoch, max_length=3072
|
| 74 |
+
- Trained on Darwin-218B-kr (S4 6×B200 bf16)
|
| 75 |
+
- Merge: full dense checkpoint, no runtime adapter loading
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
## Architecture
|
| 80 |
+
|
| 81 |
+
| Item | Value |
|
| 82 |
+
|------|-------|
|
| 83 |
+
| Total parameters | 218B |
|
| 84 |
+
| Active parameters | ~25B (MoE) |
|
| 85 |
+
| Experts | 128 (Cohere2 MoE) |
|
| 86 |
+
| Precision | BF16 |
|
| 87 |
+
| Architecture | `Cohere2VisionForConditionalGeneration` (multimodal-capable, text-primary) |
|
| 88 |
+
| Tokenizer | Cohere2 (vocab 256K) |
|
| 89 |
+
| Languages | English, Korean |
|
| 90 |
+
| Context | 65,536 tokens |
|
| 91 |
+
| License | Apache-2.0 |
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## DELPHI 5-Phase Cascade (signature inference mode)
|
| 96 |
+
|
| 97 |
+
The VIDRAFT DELPHI cascade routes each question through 5 progressively deeper inference stages:
|
| 98 |
+
|
| 99 |
+
1. **P1** — greedy single-shot (temperature 0)
|
| 100 |
+
2. **P2** — MAJ@8 majority vote (temperature 0.7)
|
| 101 |
+
3. **P3** — 16-vote tiebreak for close calls
|
| 102 |
+
4. **P4** — Multi-Turn Inference (MTI): 3-turn self-critique × 8 chains
|
| 103 |
+
5. **P5** — weighted global tiebreak across all phases
|
| 104 |
+
|
| 105 |
+
Compute-optimal: most questions resolve at P1/P2; only ambiguous ones escalate.
|
| 106 |
+
|
| 107 |
+
---
|
| 108 |
+
|
| 109 |
+
## Usage
|
| 110 |
+
|
| 111 |
+
### vLLM (recommended)
|
| 112 |
+
|
| 113 |
+
```bash
|
| 114 |
+
vllm serve FINAL-Bench/Darwin-218B-Delphi \
|
| 115 |
+
--tensor-parallel-size 8 \
|
| 116 |
+
--dtype bfloat16 \
|
| 117 |
+
--max-model-len 65536 \
|
| 118 |
+
--trust-remote-code \
|
| 119 |
+
--enforce-eager \
|
| 120 |
+
--limit-mm-per-prompt '{"image":0,"video":0}'
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
Requires vLLM ≥ 0.21.0 (`Cohere2VisionForConditionalGeneration` support).
|
| 124 |
+
|
| 125 |
+
### Transformers
|
| 126 |
+
|
| 127 |
+
```python
|
| 128 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 129 |
+
import torch
|
| 130 |
+
|
| 131 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 132 |
+
"FINAL-Bench/Darwin-218B-Delphi",
|
| 133 |
+
dtype=torch.bfloat16,
|
| 134 |
+
device_map="auto",
|
| 135 |
+
trust_remote_code=True,
|
| 136 |
+
)
|
| 137 |
+
tok = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-218B-Delphi")
|
| 138 |
+
|
| 139 |
+
messages = [
|
| 140 |
+
{"role": "user", "content": "Explain the SN2 mechanism step by step, "
|
| 141 |
+
"then justify why CH3I reacts faster than CH3Cl."}
|
| 142 |
+
]
|
| 143 |
+
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 144 |
+
inputs = tok(prompt, return_tensors="pt").to(model.device)
|
| 145 |
+
out = model.generate(**inputs, max_new_tokens=2048, temperature=0.3, top_p=0.9)
|
| 146 |
+
print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## License
|
| 152 |
+
|
| 153 |
+
**Apache License 2.0**
|
| 154 |
+
|
| 155 |
+
Built upon `CohereLabs/command-a-plus-05-2026-bf16` (Apache-2.0) and `Darwin-218B-kr` (Apache-2.0). All upstream components are permissively licensed.
|
| 156 |
+
|
| 157 |
+
---
|
| 158 |
+
|
| 159 |
+
## Contributors
|
| 160 |
+
|
| 161 |
+
**Lead Architect & Developer** — 장재원 (Jaewon Jang), CTO, VIDRAFT
|
| 162 |
+
*Domain SFT distillation pipeline, DELPHI cascade design, model integration.*
|
| 163 |
+
|
| 164 |
+
**Organization** — VIDRAFT / FINAL-Bench
|
| 165 |
+
https://huggingface.co/FINAL-Bench
|
| 166 |
+
|
| 167 |
+
---
|
| 168 |
+
|
| 169 |
+
## Citation
|
| 170 |
+
|
| 171 |
+
```bibtex
|
| 172 |
+
@misc{darwin-218b-delphi-2026,
|
| 173 |
+
title = {Darwin-218B-Delphi: Chemistry-Specialized 218B MoE with DELPHI Cascade Inference},
|
| 174 |
+
author = {Jang, Jaewon and {VIDRAFT FINAL-Bench Team}},
|
| 175 |
+
year = {2026},
|
| 176 |
+
publisher = {Hugging Face},
|
| 177 |
+
howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-218B-Delphi}}
|
| 178 |
+
}
|
| 179 |
+
```
|