Instructions to use FINAL-Bench/Darwin-28B-REASON with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FINAL-Bench/Darwin-28B-REASON with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-28B-REASON")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("FINAL-Bench/Darwin-28B-REASON")
model = AutoModelForImageTextToText.from_pretrained("FINAL-Bench/Darwin-28B-REASON")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FINAL-Bench/Darwin-28B-REASON with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FINAL-Bench/Darwin-28B-REASON"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-28B-REASON",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FINAL-Bench/Darwin-28B-REASON

SGLang

How to use FINAL-Bench/Darwin-28B-REASON with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FINAL-Bench/Darwin-28B-REASON" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-28B-REASON",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FINAL-Bench/Darwin-28B-REASON" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-28B-REASON",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FINAL-Bench/Darwin-28B-REASON with Docker Model Runner:
```
docker model run hf.co/FINAL-Bench/Darwin-28B-REASON
```

SeaWolf-AI commited on 6 days ago

Commit

aa76bb8

verified ·

1 Parent(s): a0cbc88

Finalize model card: Darwin-28B-REASON (RTD + Darwin-DELPHI, GPQA 89.39)

Browse files

Files changed (1) hide show

README.md +239 -3

README.md CHANGED Viewed

@@ -1,10 +1,246 @@
 ---
 license: apache-2.0
-base_model: FINAL-Bench/Darwin-28B-Opus
-library_name: peft
 tags:
   - darwin
   - reasoning
 ---
-# Darwin-28B-REASON

 ---
 license: apache-2.0
+language:
+  - en
+  - zh
+  - ko
+  - ja
+  - multilingual
+library_name: transformers
+pipeline_tag: text-generation
 tags:
   - darwin
+  - darwin-reason
   - reasoning
+  - advanced-reasoning
+  - chain-of-thought
+  - thinking
+  - reasoning-trace-distillation
+  - rtd
+  - darwin-delphi
+  - test-time-compute
+  - qwen3.6
+  - qwen
+  - lora
+  - peft
+  - adapter
+  - gpqa
+  - benchmark
+  - open-source
+  - apache-2.0
+  - proto-agi
+  - vidraft
+  - eval-results
+base_model:
+  - FINAL-Bench/Darwin-28B-Opus
+base_model_relation: adapter
+model-index:
+  - name: Darwin-28B-REASON
+    results:
+      - task:
+          type: text-generation
+          name: Graduate-Level Reasoning
+        dataset:
+          type: Idavidrein/gpqa
+          name: GPQA Diamond
+          config: gpqa_diamond
+          split: train
+        metrics:
+          - type: accuracy
+            value: 89.39
+            name: Accuracy (with Darwin-DELPHI)
+            verified: false
 ---
+# Darwin-28B-REASON — Reasoning-Trace Distilled, Darwin-DELPHI Enhanced
+<p align="center">
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-28B-REASON"><img src="https://img.shields.io/badge/⭐_GPQA_Diamond-89.39%25_Darwin--28B--REASON-gold?style=for-the-badge" alt="GPQA"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-28B-Opus"><img src="https://img.shields.io/badge/🧬_Base-Darwin--28B--Opus_(88.89%25)-blue?style=for-the-badge" alt="Opus"></a>
+</p>
+<p align="center">
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-36B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--36B--Opus_(88.4%25)-blue?style=for-the-badge" alt="36B"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-27B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--27B--Opus_(86.9%25)-blue?style=for-the-badge" alt="27B"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-NEG"><img src="https://img.shields.io/badge/⚡_Model-Darwin--9B--NEG_(84.3%25)-purple?style=for-the-badge" alt="NEG"></a>
+</p>
+<p align="center">
+  <a href="https://huggingface.co/collections/FINAL-Bench/darwin-family"><img src="https://img.shields.io/badge/🏠_Darwin_Family-Collection-green?style=for-the-badge" alt="Family"></a>
+  <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/🏆_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
+</p>
+> Reasoning-enhanced model built on Darwin-28B-Opus · Reasoning-Trace Distillation (RTD) · Darwin-DELPHI test-time engine · BF16 · Apache 2.0
+> **GPQA Diamond: 89.39 % with Darwin-DELPHI**
+---
+## Overview
+**Darwin-28B-REASON** is a reasoning-enhanced model built on top of **[Darwin-28B-Opus](https://huggingface.co/FINAL-Bench/Darwin-28B-Opus)**. It combines two components:
+1. **Reasoning-Trace Distillation (RTD)** — a reasoning-trace distillation stage applied to the Darwin-28B-Opus base, delivered as a lightweight adapter.
+2. **Darwin-DELPHI** — a proprietary test-time reasoning engine.
+Together they push graduate-level scientific reasoning to the top tier of the Darwin family: **89.39 %** on GPQA Diamond with Darwin-DELPHI. The model is released under **Apache-2.0**.
+---
+## 🧬 Darwin Platform & Research
+**Darwin** is VIDRAFT's measuring-result-driven Korean reasoning model family — approximately **20 official models** plus **400+ community derivatives**, ranking **#3 globally on GPQA** among open models. The base model, **Darwin-28B-Opus**, is the HuggingFace-official **GPQA #3 (88.89 %)** model.
+- **Platform technique** — MRI trust-weighted Evolutionary Merge ([arXiv:2605.14386](https://arxiv.org/abs/2605.14386)).
+- **FINAL Bench** — VIDRAFT's evaluation framework (SSRN): MetaCognition **+14.05**, MA-ER Gap **0.392**.
+- **4-layer Pre-AGI roadmap** — Darwin → AETHER → PROMETHEUS → HEPHAESTUS.
+---
+## 🧬 Model Lineage
+| Role | Model | Contribution |
+|:---:|:---|:---|
+| **Base** | [`FINAL-Bench/Darwin-28B-Opus`](https://huggingface.co/FINAL-Bench/Darwin-28B-Opus) | GPQA #3 (88.89 %) Qwen3.6-generation reasoning backbone. |
+| **RTD adapter** | reasoning-trace distillation | Distills complete reasoning chains into a lightweight adapter on the Opus base. |
+| **Test-time engine** | Darwin-DELPHI | Proprietary inference-time consensus engine (not stored in weights). |
+| **Result** | **`Darwin-28B-REASON`** (this model) | RTD adapter + Darwin-DELPHI → **89.39 %** GPQA Diamond. |
+---
+## ⚙️ Technical Specifications
+| Component | Value |
+|:---|:---|
+| Base architecture | `Qwen3_5ForConditionalGeneration` (Qwen3.6 generation, hybrid linear + full attention) |
+| Base model | FINAL-Bench/Darwin-28B-Opus (27.6 B, BF16) |
+| Delivery | LoRA / PEFT adapter on the Darwin-28B-Opus base |
+| Precision | bfloat16 |
+| Context length | Inherited from base (long-chain reasoning supported) |
+| License | Apache 2.0 |
+---
+## 🔬 Core Techniques
+### ① RTD — Reasoning-Trace Distillation
+RTD distills **complete reasoning chains** from a publicly available mathematical corpus (Apache-2.0 source) into a lightweight adapter on the Darwin-28B-Opus base. The adapter strengthens long-form, multi-step scientific reasoning while preserving the base model's bilingual capability.
+> The full RTD recipe (curation, trace selection, training schedule) is **proprietary** and is not disclosed.
+### ② Darwin-DELPHI — Test-Time Reasoning Engine
+**Darwin-DELPHI** is a proprietary test-time engine applied at inference. It performs **multi-sample cross-validation**, **re-examination of uncertain responses**, and **iterative self-critique**, converging to a **consensus** answer through a single-agent Delphi-method procedure.
+> Darwin-DELPHI is **not stored in the model weights**. Its internal parameters — sampling counts, stage transitions, and decision thresholds — are a **trade secret** and are not published.
+---
+## 🏆 Benchmark — GPQA Diamond (198 questions)
+GPQA Diamond is a 198-question, PhD-level graduate science reasoning benchmark.
+| Model | Engine | **Accuracy** |
+|:---|:---|:---:|
+| Darwin-28B-Opus (base) | Standard | 88.89 % (176 / 198) |
+| **Darwin-28B-REASON** | **Darwin-DELPHI** | **🥇 89.39 % (177 / 198)** |
+The evaluation methodology for the Darwin-DELPHI result is **protected**; sample counts, staging, and thresholds are a **trade secret**.
+---
+## 🚀 Usage
+Darwin-28B-REASON ships as a **LoRA / PEFT adapter** on the Darwin-28B-Opus base.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+import torch
+BASE = "FINAL-Bench/Darwin-28B-Opus"
+ADAPTER = "FINAL-Bench/Darwin-28B-REASON"
+tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
+base = AutoModelForCausalLM.from_pretrained(
+    BASE,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+model = PeftModel.from_pretrained(base, ADAPTER)
+model.eval()
+messages = [
+    {"role": "user",
+     "content": "A particle moves along x(t) = t³ − 6t² + 9t. Find when it is at rest and classify the motion."}
+]
+text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tok(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=2048)
+print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
+```
+> The 89.39 % GPQA Diamond result is produced with the Darwin-DELPHI test-time engine, which is applied on top of this adapter. Darwin-DELPHI is provided through the Darwin-series evaluation harness.
+---
+## 🎯 Recommended Use-Cases
+- **Graduate-level STEM reasoning** (GPQA / science qualifying exams)
+- **Mathematical problem solving** (MATH, AIME-style problems)
+- **Complex multi-step chain-of-thought tasks**
+- **Code generation and debugging**
+- **Bilingual reasoning** (strong English + Korean; also Chinese / Japanese)
+## ⚠️ Limitations
+- Requires the Darwin-28B-Opus base (≈ 55 GB VRAM in bfloat16) plus the adapter; a single A100-80GB or B200 is sufficient.
+- The 89.39 % result depends on the Darwin-DELPHI test-time engine; the adapter alone provides strong but lower single-model accuracy.
+- Optimised for English first, with secondary support for Korean, Chinese, and Japanese.
+- Reasoning traces tend to be verbose — control with `max_new_tokens` as needed.
+---
+## 📚 Citation
+```bibtex
+@misc{darwin28b_reason_2026,
+  title  = {Darwin-28B-REASON: Reasoning-Trace Distillation and Darwin-DELPHI Test-Time Reasoning on Darwin-28B-Opus},
+  author = {FINAL-Bench / Darwin Research Team},
+  year   = {2026},
+  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-28B-REASON}},
+  note   = {RTD adapter + Darwin-DELPHI · 89.39 % GPQA Diamond}
+}
+@misc{darwin_family_2026,
+  title  = {Darwin Family: MRI Trust-Weighted Evolutionary Merging for Reasoning Models},
+  author = {VIDRAFT / FINAL-Bench},
+  year   = {2026},
+  howpublished = {\url{https://arxiv.org/abs/2605.14386}}
+}
+@misc{final_bench_2026,
+  title  = {FINAL Bench: A Measuring-Result-Driven Evaluation Framework for Reasoning Models},
+  author = {VIDRAFT / FINAL-Bench},
+  year   = {2026},
+  howpublished = {SSRN}
+}
+```
+---
+## 🔗 Related Darwin Models
+- **Darwin-28B-Opus** — base model, Qwen3.6-27B × Opus distilled, GPQA 88.89 %
+- **Darwin-36B-Opus** — MoE 36B, GPQA 88.4 %
+- **Darwin-27B-Opus** — 27B dense (Qwen3.5 generation), GPQA 86.9 %
+- **Darwin-9B-NEG** — 9B with Negentropy distillation, GPQA 84.3 %
+- **Darwin-4B-Genesis** — smallest Darwin member
+---
+This model is introduced in [Darwin Family](https://arxiv.org/abs/2605.14386).
+*Darwin-28B-REASON · RTD + Darwin-DELPHI · 89.39 % GPQA Diamond · FINAL-Bench*