SeaWolf-AI commited on
Commit
c2bfdba
·
verified ·
1 Parent(s): 62100b6

Initial release — Darwin-60B-DUO (Hybrid-A: Route 70% / Split-Refine 20% / Ensemble V_1 10%)

Browse files
LICENSE ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Darwin-60B-DUO — Combined License Notice
2
+ ========================================
3
+
4
+ This repository aggregates two constituent base models, each governed by its
5
+ own license. The combined repository inherits the more restrictive of the
6
+ two — the Gemma license — as the effective deployment license.
7
+
8
+ ────────────────────────────────────────────────────────────────────────────
9
+ 1. Constituent base model licenses
10
+ ────────────────────────────────────────────────────────────────────────────
11
+
12
+ - Darwin-28B-REASON (FINAL-Bench/Darwin-28B-REASON)
13
+ License: Apache License 2.0
14
+ Source : https://www.apache.org/licenses/LICENSE-2.0
15
+
16
+ - AWAXIS-Think-31B (Anserwise/AWAXIS-Think-31B)
17
+ License: Gemma Terms of Use (inherited from Google Gemma-4 base)
18
+ Source : https://ai.google.dev/gemma/terms
19
+
20
+ ────────────────────────────────────────────────────────────────────────────
21
+ 2. Effective combined license for Darwin-60B-DUO
22
+ ────────────────────────────────────────────────────────────────────────────
23
+
24
+ Because the Gemma Terms of Use impose more specific restrictions than
25
+ Apache-2.0 (notably the Gemma Prohibited Use Policy), the combined
26
+ Darwin-60B-DUO release is distributed under the **Gemma Terms of Use**.
27
+
28
+ Users intending commercial deployment must:
29
+ - Comply with the Gemma Terms of Use in full
30
+ https://ai.google.dev/gemma/terms
31
+ - Comply with the Gemma Prohibited Use Policy
32
+ https://ai.google.dev/gemma/prohibited_use_policy
33
+ - Retain all attribution and notices for both constituent models
34
+
35
+ ────────────────────────────────────────────────────────────────────────────
36
+ 3. Gateway code (this repository's `gateway/`, `docker/`, etc.)
37
+ ────────────────────────────────────────────────────────────────────────────
38
+
39
+ The orchestration code authored for Darwin-60B-DUO (FastAPI gateway,
40
+ router, refine, ensemble, Docker compose) is released under
41
+ Apache License 2.0 to maximize developer flexibility. The combined
42
+ license inheritance applies only to the served model behaviour, not the
43
+ code that orchestrates it.
44
+
45
+ ────────────────────────────────────────────────────────────────────────────
46
+ 4. Disclaimer
47
+ ────────────────────────────────────────────────────────────────────────────
48
+
49
+ This document is a license summary for end-user convenience. In case of
50
+ any conflict, the original license texts of the constituent models
51
+ (Apache-2.0 and Gemma Terms of Use) govern. Users should consult those
52
+ authoritative sources for binding obligations.
53
+
54
+ Copyright (c) 2026 FINAL-Bench, VIDRAFT, Anserwise.
README.md ADDED
@@ -0,0 +1,319 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - ko
5
+ - en
6
+ - multilingual
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - darwin
11
+ - darwin-family
12
+ - darwin-duo
13
+ - duo
14
+ - ensemble
15
+ - mixture-of-models
16
+ - router
17
+ - korean
18
+ - reasoning
19
+ - finalbench
20
+ - vidraft
21
+ base_model:
22
+ - FINAL-Bench/Darwin-28B-REASON
23
+ - Anserwise/AWAXIS-Think-31B
24
+ ---
25
+
26
+ <div align="center">
27
+
28
+ # 🌳 Darwin-60B-DUO
29
+
30
+ ### Darwin family 최초의 듀오 모델 — 두 SOTA가 하나로
31
+ ### *The first DUO of the Darwin family — two SOTAs unified into one model*
32
+
33
+ </div>
34
+
35
+ ---
36
+
37
+ ## ✨ 한 줄 요약 · TL;DR
38
+
39
+ > **HF 공인 GPQA Diamond 3위** Darwin-28B-REASON과
40
+ > **한국 과기부 K-AI 리더보드 1위** AWAXIS-Think-31B를
41
+ > **단일 OpenAI-호환 endpoint** 로 묶은 **Darwin family 최초 듀오 모델**.
42
+ >
43
+ > *Combines the **#3 HF-verified GPQA Diamond** Darwin-28B-REASON with the
44
+ > **#1 Korean K-AI Leaderboard** AWAXIS-Think-31B behind a **single OpenAI-compatible endpoint** — the Darwin family's first DUO release.*
45
+
46
+ ---
47
+
48
+ ## 🏆 두 SOTA 모델 구성 · Two SOTA Constituents
49
+
50
+ | 구성 모델 | 공인 성과 (Verified Rank) | 강점 (Strengths) | 파라미터 |
51
+ |-----------|-----------------------|-----------------|--------|
52
+ | **Darwin-28B-REASON** | 🥉 **Hugging Face 공인 GPQA Diamond 벤치마크 3위** | English graduate-level reasoning · STEM · 수학 · code | 26.9 B |
53
+ | **AWAXIS-Think-31B** | 🥇 **대한민국 과학기술정보통신부 운영 국가 공인 K-AI 리더보드 1위** | 한국어 이해/생성 · 한국 문화 · 자연스러운 어조 | 31.27 B |
54
+ | **Darwin-60B-DUO** (this) | *Aggregate Brand* | 위 두 영역 SOTA 결합 + 자동 hybrid 라우팅 | 58.17 B (≈ 60 B) |
55
+
56
+ > 💡 **AWAXIS-Think-31B 역시 Darwin family 입니다.**
57
+ > Darwin 팀이 Google Gemma-4 base 위에 한국어 specialist 분기로 distill 한 모델로,
58
+ > 기존 Darwin (Qwen3.5 계열) lineage 와 함께 Darwin family 양대 축을 형성합니다.
59
+ >
60
+ > *AWAXIS-Think-31B is also part of the **Darwin family** — a Korean specialist branch distilled by the Darwin team on top of Google's Gemma-4 base, complementing the original Qwen3.5-line Darwin lineage as the family's second axis.*
61
+
62
+ ---
63
+
64
+ ## 🎯 무엇이 특별한가 · What Makes It Unique
65
+
66
+ ### 1️⃣ 영역별 SOTA를 한 모델에 (Two SOTA Domains in One Model)
67
+ 영어 reasoning과 한국어 자연성을 동시에 SOTA 수준으로 달성하는 단일 LLM은 극히 드뭅니다.
68
+ Darwin-60B-DUO는 각 영역 공인 SOTA 두 모델을 **하나의 API endpoint** 로 묶어,
69
+ 사용자가 orchestration 을 인식하지 못한 채 두 강점을 동시에 누립니다.
70
+
71
+ *Few single LLMs achieve SOTA in both English reasoning and Korean naturalness simultaneously. Darwin-60B-DUO unifies two domain-verified SOTAs behind one endpoint — users benefit from both without orchestration overhead.*
72
+
73
+ ### 2️⃣ 자동 Hybrid 라우팅 (Auto Hybrid Routing — "Hybrid-A")
74
+ 입력을 분석하여 **시나리오별로 최적 전략을 자동 선택** 합니다.
75
+
76
+ | 시나리오 (Scenario) | 라우팅 전략 (Strategy) | 호출 모델 | 비용 (Cost) | 비중 (Share) |
77
+ |---------------------|----------------------|----------|------------|------------|
78
+ | 순수 한국어 (Pure Korean) — 이메일, 한국 정보, 채팅 | **Route → AWAXIS** | 1 model | 1× | ~50 % |
79
+ | 순수 영어 (Pure English) — 코드, 수학, 영어 reasoning | **Route → Darwin** | 1 model | 1× | ~20 % |
80
+ | 한국어 답 + 영어/STEM reasoning 필요 (Korean output needing English/STEM reasoning) | **Split → Darwin reasons → AWAXIS polishes** | 2 models, sequential | 2× | ~15 % |
81
+ | 영어 답 + 한국 정보 필요 (English output needing Korean context) | **Split → AWAXIS retrieves → Darwin polishes** | 2 models, sequential | 2× | ~5 % |
82
+ | 객관식·짧은 답 (MCQ / short answer) | **Ensemble V₁ tournament** | 2 models + cross-verify | 2× | ~10 % |
83
+
84
+ **평균 비용 ≈ 1.3 × of a single 30B model**: 70% 케이스는 1×, 30% 케이스만 2×.
85
+ *Average effective cost is roughly 1.3× a single 30B model.*
86
+
87
+ ### 3️⃣ 단일 모델 façade (Single-Model Façade)
88
+ **OpenAI API 호환 단일 endpoint.** 기존 도구 (LangChain · LlamaIndex · OpenAI SDK · Continue · Cursor 등)를 코드 변경 없이 그대로 사용합니다.
89
+
90
+ ```python
91
+ from openai import OpenAI
92
+
93
+ client = OpenAI(base_url="http://your-server:8000/v1", api_key="anything")
94
+ resp = client.chat.completions.create(
95
+ model="darwin-60b-duo", # 한 모델로 호출 / single model name
96
+ messages=[{"role": "user",
97
+ "content": "GPT-5와 Claude의 reasoning 차이를 한국어로 정리해줘"}],
98
+ )
99
+ print(resp.choices[0].message.content)
100
+ # 내부: Darwin이 영어 reasoning → AWAXIS가 한국어로 다듬어 반환
101
+ # Internally: Darwin reasons in English → AWAXIS polishes in Korean
102
+ ```
103
+
104
+ ### 4️⃣ 효율적 GPU 운영 (Efficient GPU Footprint)
105
+ - **FP8 양자화** 시 합산 약 **30 GB** → **단일 B200/H100 (80 GB) GPU 1대** 로 충��
106
+ - BF16 운영 시 B200 2대 (각 ~ 60 GB)
107
+ - vLLM 기반 high-throughput inference (텐서 병렬·prefix caching 지원)
108
+
109
+ *With FP8 quantization, the combined footprint (~30 GB) fits on a single B200/H100. BF16 deployment uses two B200 GPUs.*
110
+
111
+ ---
112
+
113
+ ## 🌳 Darwin Family 가족 트리 · Family Tree
114
+
115
+ ```
116
+ 🌳 Darwin Family
117
+
118
+ ├─ 👴 GRANDPARENTS (Foundation lineage)
119
+ │ ├─ Cohere Command A+ ── English reasoning lineage (218 B)
120
+ │ └─ Google Gemma-4-31B-it ── Korean/multilingual base
121
+
122
+ ├─ 👨 PARENTS (Family bases)
123
+ │ ├─ Darwin-9B ── omni-modal, ko-en compact
124
+ │ ├─ Darwin-28B-Opus ── English reasoning base
125
+ │ ├─ Darwin-31B-Opus ── Korean multimodal base
126
+ │ └─ Darwin-218B-Delphi ── cascade flagship (GPQA Diamond 90.91 %)
127
+
128
+ ├─ 🧒 SPECIALISTS (Children — domain SOTAs)
129
+ │ ├─ Darwin-28B-REASON 🥉 ── HF GPQA Diamond #3 (English reasoning specialist)
130
+ │ └─ AWAXIS-Think-31B 🥇 ── K-AI Leaderboard #1 (Korean specialist, Gemma-4 branch)
131
+
132
+ └─ ⭐ Darwin-60B-DUO ⭐ (you are here)
133
+ └─ Two specialists unified — 두 specialist 를 단일 모델로
134
+ ```
135
+
136
+ ---
137
+
138
+ ## 🚀 사용법 · Usage
139
+
140
+ ### Option A — Docker Compose (권장 / Recommended)
141
+
142
+ ```bash
143
+ git clone https://huggingface.co/FINAL-Bench/Darwin-60B-DUO
144
+ cd Darwin-60B-DUO
145
+ docker compose -f docker/docker-compose.yml up -d
146
+
147
+ # 검증 / Verify
148
+ curl http://localhost:8000/v1/models
149
+ # → {"data":[{"id":"darwin-60b-duo","object":"model"}]}
150
+
151
+ curl http://localhost:8000/v1/chat/completions \
152
+ -H "Content-Type: application/json" \
153
+ -d '{"model":"darwin-60b-duo",
154
+ "messages":[{"role":"user","content":"안녕하세요. 자기 소개 부탁드립니다."}]}'
155
+ ```
156
+
157
+ ### Option B — Manual launch (B200 / H100 × 2)
158
+
159
+ ```bash
160
+ # 1) Darwin-28B-REASON (port 8021, GPU 0)
161
+ CUDA_VISIBLE_DEVICES=0 VLLM_DP_MASTER_PORT=45011 \
162
+ vllm serve FINAL-Bench/Darwin-28B-REASON \
163
+ --port 8021 --served-model-name darwin-28r \
164
+ --quantization fp8 --enforce-eager \
165
+ --limit-mm-per-prompt '{"image":0,"video":0}' &
166
+
167
+ # 2) AWAXIS-Think-31B (port 8022, GPU 1)
168
+ CUDA_VISIBLE_DEVICES=1 VLLM_DP_MASTER_PORT=45012 \
169
+ vllm serve Anserwise/AWAXIS-Think-31B \
170
+ --port 8022 --served-model-name awaxis-31b \
171
+ --quantization fp8 --enforce-eager \
172
+ --limit-mm-per-prompt '{"image":0,"video":0}' &
173
+
174
+ # 3) Gateway (port 8000) — from this repo
175
+ pip install -r gateway/requirements.txt
176
+ python gateway/server.py --port 8000 \
177
+ --darwin-url http://127.0.0.1:8021/v1 \
178
+ --awaxis-url http://127.0.0.1:8022/v1
179
+ ```
180
+
181
+ > 💡 **단일 GPU 운영 (Single GPU)**: FP8 양자화 시 두 모델 합산 ~30 GB이므로 80 GB GPU 1대에 collocate 가능. `CUDA_VISIBLE_DEVICES=0` 으로 통일 + `--gpu-memory-utilization 0.45` 씩 분배.
182
+
183
+ ---
184
+
185
+ ## ⚙️ 운영 모드 상세 · Operation Modes
186
+
187
+ ### 🟢 Mode 1 · Route (단일 라우팅, 70 % 케이스)
188
+ 입력 분석 → 한 모델만 호출. **가장 빠르고 저렴**.
189
+ *Language + domain detection → single backend. Fastest and cheapest.*
190
+
191
+ 판정 신호 / Detection signals:
192
+ - `korean_ratio(prompt) > 0.3` → AWAXIS
193
+ - 코드 키워드 (`def`, `function`, `import`, `class`) → Darwin
194
+ - 수학 마커 (`\boxed`, `equation`, `prove`) → Darwin
195
+ - 기타 / Else → 다수 언어 / domain 기준 가중치
196
+
197
+ ### 🟡 Mode 2 · Split / Refine (분업 협력, 20 % 케이스)
198
+ 한 모델이 초안 → 다른 모델이 다듬기. **두 모델의 장점 결합**.
199
+
200
+ ```
201
+ 예: "엔트로피를 한국어로 쉽게 풀어줘"
202
+
203
+ Step 1 Darwin (정확한 영어 reasoning) →
204
+ "Entropy quantifies the number of microstates compatible with a
205
+ given macrostate, representing disorder ..."
206
+
207
+ Step 2 AWAXIS (자연스러운 한국어 다듬기) →
208
+ "엔트로피는 쉽게 말하면 '무질서함의 정도' 입니다.
209
+ 같은 모습으로 보이지만 사실 그 안에 ..."
210
+ ```
211
+
212
+ ### 🔴 Mode 3 · Ensemble V₁ Tournament (앙상블, 10 % 케이스 — 객관식·짧은 답)
213
+ 두 모델이 각자 **N=8 self-consistency** → majority vote.
214
+ - 답 일치 시 → 그대로 반환 (강한 신호)
215
+ - 답 불일치 시 → 두 모델이 **서로의 답을 cross-verify** → tournament winner
216
+
217
+ ```
218
+ 질문: "A/B/C/D 중 정답은?"
219
+ Darwin (8 sample MAJ) → "C"
220
+ AWAXIS (8 sample MAJ) → "B"
221
+ → 불일치 → Darwin 에게 "C vs B 중 정답?" + AWAXIS 에게 같은 질문
222
+ → verdict 합의 → final answer
223
+ ```
224
+
225
+ ---
226
+
227
+ ## 📦 Repository Layout
228
+
229
+ ```
230
+ Darwin-60B-DUO/
231
+ ├── README.md ← 본 모델카드 / this model card
232
+ ├── config.json ← DUO config (base_models reference)
233
+ ├── tokenizer_info.json ← base tokenizer reference 정보
234
+ ├── gateway/
235
+ │ ├── server.py ← FastAPI orchestrator
236
+ │ ├── router.py ← 한/영, 도메인, 복잡도 판단
237
+ │ ├── refine.py ← Sequential refine logic
238
+ │ ├── ensemble.py ← V₁ cross-verification + MAJ@N
239
+ │ └── requirements.txt
240
+ ├── docker/
241
+ │ └── docker-compose.yml ← vLLM ×2 + gateway 통합 launcher
242
+ ├── benchmarks/
243
+ │ └── README.md ← 평가 자산 (TBA — coming soon)
244
+ └── LICENSE ← Gemma + Apache-2.0 dual notice
245
+ ```
246
+
247
+ ---
248
+
249
+ ## 📊 평가 · Evaluation
250
+
251
+ ### 구성 모델 공인 점수 (Verified Constituent Scores)
252
+ - **Darwin-28B-REASON** — Hugging Face 공인 **GPQA Diamond 벤치마크 3위**
253
+ - **AWAXIS-Think-31B** — 대한민국 과학기술정보통신부 운영 **국가 공인 K-AI 리더보드 1위**
254
+
255
+ ### Darwin-60B-DUO Aggregate Bench
256
+ - **GPQA Diamond (full 198Q)** — TBA (정식 평가 진행 예정 / scheduled)
257
+ - **KMMLU** — TBA
258
+ - **CLIcK (Korean cultural)** — TBA
259
+ - **Helmet · Ruler (long context)** — TBA
260
+
261
+ > 정식 198Q GPQA 및 K-AI 리더보드 DUO 점수는 평가 완료 후 `benchmarks/` 디렉토리에 게재됩니다.
262
+ >
263
+ > *Full 198-question GPQA and K-AI leaderboard DUO scores will be published in `benchmarks/` after formal evaluation.*
264
+
265
+ ---
266
+
267
+ ## 📜 라이센스 · License
268
+
269
+ **Combined license — Gemma** (the more restrictive of the constituent base models).
270
+
271
+ | 구성 모델 | License |
272
+ |-----------|---------|
273
+ | Darwin-28B-REASON | Apache-2.0 |
274
+ | AWAXIS-Think-31B | Gemma (inherited from Gemma-4) |
275
+ | **Darwin-60B-DUO** | **Gemma** (combined-license inheritance rule) |
276
+
277
+ 상업적 이용에 앞서 [Gemma Terms of Use](https://ai.google.dev/gemma/terms) 와 [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy) 를 반드시 검토하세요.
278
+
279
+ *Please review the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) before commercial deployment.*
280
+
281
+ ---
282
+
283
+ ## ⚠️ Limitations · 한계
284
+
285
+ - **합산 모델 weight 자체는 본 repo에 없음** — gateway 가 두 base 모델 (Darwin-28B-REASON · AWAXIS-Think-31B) 의 vLLM endpoint 를 호출. 각 base 모델 weight 는 해당 repo 에서 별도 fetch 됩니다.
286
+ - **2× GPU baseline** — BF16 운영 시 GPU 2대 필요. FP8 시 1대 가능 (B200/H100 80GB 기준).
287
+ - **추가 latency** — Split / Ensemble mode 사용 시 단일 모델 대비 ~ 2× 지연 발생.
288
+ - **두 모델 동시 학습 데이터 cut-off** — Darwin-28B-REASON: ~2026-Q1, AWAXIS-Think-31B: ~2026-Q1.
289
+ - **할루시네이션 (Hallucination)** — 일반 LLM 의 한계 그대로 적용됩니다.
290
+
291
+ *Combined weights are not bundled in this repo — the gateway calls vLLM endpoints of the two base models, each fetched from their respective repos.*
292
+
293
+ ---
294
+
295
+ ## 🙏 Acknowledgments
296
+
297
+ - **민식 (FINAL-Bench team lead)** — Darwin family architecture & DUO concept
298
+ - **Anserwise Korean specialist team** — AWAXIS-Think-31B development
299
+ - **VIDRAFT** — orchestration framework & Hybrid-A routing strategy
300
+ - **Google DeepMind** — Gemma-4 foundation
301
+ - **Cohere & Qwen team** — Command A+ / Qwen3.5 foundation lineage
302
+
303
+ ---
304
+
305
+ ## 📞 Contact
306
+
307
+ - HF org: [FINAL-Bench](https://huggingface.co/FINAL-Bench)
308
+ - Sister orgs: [Anserwise](https://huggingface.co/Anserwise) · [VIDraft](https://huggingface.co/VIDraft)
309
+ - Issues / discussions: 본 repo 의 **Community** 탭
310
+
311
+ ---
312
+
313
+ <div align="center">
314
+
315
+ > ⭐ **Darwin-60B-DUO is the Darwin family's first DUO model. One model — two SOTAs.** ⭐
316
+ >
317
+ > ⭐ **Darwin-60B-DUO는 Darwin 패밀리 최초의 듀오 모델입니다. 하나의 모델, 두 개의 SOTA.** ⭐
318
+
319
+ </div>
benchmarks/README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Darwin-60B-DUO Benchmarks
2
+
3
+ > 📌 정식 benchmark 결과는 평가 완료 후 본 디렉토리에 게재됩니다.
4
+ > *Formal benchmark results will be posted here after evaluation.*
5
+
6
+ ## 평가 예정 항목 · Scheduled Evaluations
7
+
8
+ | Benchmark | Scope | Constituent score (verified) | DUO aggregate |
9
+ |-----------|-------|-----------------------------|---------------|
10
+ | **GPQA Diamond (full 198Q)** | English graduate reasoning | Darwin-28B-REASON: HF #3 | TBA |
11
+ | **K-AI Leaderboard** | Korean | AWAXIS-Think-31B: MSIT #1 | TBA |
12
+ | **KMMLU** | Korean MMLU | TBA | TBA |
13
+ | **CLIcK** | Korean cultural | TBA | TBA |
14
+ | **Helmet · Ruler** | Long context retrieval | TBA | TBA |
15
+ | **NIAH 32K · 128K** | Needle-in-haystack | NIAH 32K: 5/5 each (sanity) | TBA |
16
+
17
+ ## Hybrid-A 라우팅 분포 검증
18
+ 프로덕션 트래픽 샘플로 라우터 분포 (50/20/15/5/10 %) 가 실제 호출에서도 유지되는지 정기 모니터링.
19
+
20
+ *Production traffic sampling regularly validates that the router distribution (50/20/15/5/10 %) holds in real workloads.*
21
+
22
+ ## 평가 방식
23
+ - Per-backend isolation: 각 base 모델 단독 점수
24
+ - DUO aggregate: gateway 거친 최종 출력 점수
25
+ - Latency overhead: gateway 추가 지연 (route mode ≈ 0, split mode ≈ 1x, ensemble mode ≈ 1.5–2x)
config.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_model_type_friendly": "duo",
3
+ "_aggregate_brand": "Darwin-60B-DUO",
4
+ "architectures": [
5
+ "DarwinDuoOrchestrator"
6
+ ],
7
+ "description": "Darwin family DUO — two SOTA constituents (English reasoning + Korean) served via a single OpenAI-compatible gateway. This repo contains the orchestrator gateway code; backend weights are fetched from the constituent repos at runtime.",
8
+ "constituents": [
9
+ {
10
+ "role": "english_reasoning_specialist",
11
+ "model_id": "FINAL-Bench/Darwin-28B-REASON",
12
+ "served_name": "darwin-28r",
13
+ "architecture": "qwen3_5",
14
+ "params_total": 26895998464,
15
+ "params_billion": 26.9,
16
+ "verified_rank": "Hugging Face GPQA Diamond #3",
17
+ "default_port": 8021,
18
+ "default_dp_master_port": 45011,
19
+ "quantization_recommended": "fp8",
20
+ "vllm_extra_args": [
21
+ "--enforce-eager",
22
+ "--limit-mm-per-prompt", "{\"image\":0,\"video\":0}"
23
+ ]
24
+ },
25
+ {
26
+ "role": "korean_specialist",
27
+ "model_id": "Anserwise/AWAXIS-Think-31B",
28
+ "served_name": "awaxis-31b",
29
+ "architecture": "gemma4",
30
+ "params_total": 31273086512,
31
+ "params_billion": 31.27,
32
+ "verified_rank": "National K-AI Leaderboard (MSIT, Korea) #1",
33
+ "darwin_family_branch": "korean_specialist (Gemma-4 base)",
34
+ "default_port": 8022,
35
+ "default_dp_master_port": 45012,
36
+ "quantization_recommended": "fp8",
37
+ "vllm_extra_args": [
38
+ "--enforce-eager",
39
+ "--limit-mm-per-prompt", "{\"image\":0,\"video\":0}"
40
+ ]
41
+ }
42
+ ],
43
+ "aggregate_params_total": 58169085976,
44
+ "aggregate_params_billion": 58.17,
45
+ "active_params_router_mode_billion": 30,
46
+ "active_params_ensemble_mode_billion": 60,
47
+ "orchestration": {
48
+ "strategy_name": "Hybrid-A",
49
+ "version": "1.0",
50
+ "distribution": {
51
+ "route_korean": 0.50,
52
+ "route_english": 0.20,
53
+ "split_korean_with_reasoning": 0.15,
54
+ "split_english_with_korean_context": 0.05,
55
+ "ensemble_v1_mcq": 0.10
56
+ },
57
+ "average_cost_multiplier": 1.3,
58
+ "modes": ["route", "split_refine", "ensemble_v1"]
59
+ },
60
+ "gateway": {
61
+ "port": 8000,
62
+ "served_model_name": "darwin-60b-duo",
63
+ "openai_compatible": true,
64
+ "endpoints": ["/v1/models", "/v1/chat/completions", "/v1/completions"]
65
+ },
66
+ "transformers_compatible": false,
67
+ "_note": "This is NOT a direct transformers AutoModel.from_pretrained() target. Use the gateway (gateway/server.py) or Docker Compose (docker/docker-compose.yml). See README for full usage."
68
+ }
docker/docker-compose.yml ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: "3.9"
2
+
3
+ # Darwin-60B-DUO — full-stack launcher
4
+ # Spins up:
5
+ # - vllm-darwin (Darwin-28B-REASON, GPU 0, port 8021 internal)
6
+ # - vllm-awaxis (AWAXIS-Think-31B, GPU 1, port 8022 internal)
7
+ # - gateway (FastAPI orchestrator, port 8000 exposed)
8
+ #
9
+ # Single-GPU collocation:
10
+ # Set CUDA_VISIBLE_DEVICES=0 for both vllm-* and lower
11
+ # --gpu-memory-utilization to 0.45 each (FP8 totals ~30GB on 80GB GPU).
12
+
13
+ services:
14
+
15
+ vllm-darwin:
16
+ image: vllm/vllm-openai:latest
17
+ container_name: darwin-60b-duo-vllm-darwin
18
+ runtime: nvidia
19
+ environment:
20
+ - CUDA_VISIBLE_DEVICES=0
21
+ - VLLM_DP_MASTER_PORT=45011
22
+ - HF_HOME=/root/.cache/huggingface
23
+ - HF_TOKEN=${HF_TOKEN:-}
24
+ command: >
25
+ --model FINAL-Bench/Darwin-28B-REASON
26
+ --served-model-name darwin-28r
27
+ --host 0.0.0.0
28
+ --port 8021
29
+ --tensor-parallel-size 1
30
+ --max-model-len 16384
31
+ --dtype bfloat16
32
+ --quantization fp8
33
+ --trust-remote-code
34
+ --enforce-eager
35
+ --limit-mm-per-prompt {"image":0,"video":0}
36
+ --gpu-memory-utilization 0.85
37
+ volumes:
38
+ - hf_cache:/root/.cache/huggingface
39
+ ports:
40
+ - "8021:8021"
41
+ deploy:
42
+ resources:
43
+ reservations:
44
+ devices:
45
+ - driver: nvidia
46
+ count: 1
47
+ capabilities: [gpu]
48
+ healthcheck:
49
+ test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8021/v1/models"]
50
+ interval: 20s
51
+ timeout: 5s
52
+ retries: 60
53
+
54
+ vllm-awaxis:
55
+ image: vllm/vllm-openai:latest
56
+ container_name: darwin-60b-duo-vllm-awaxis
57
+ runtime: nvidia
58
+ environment:
59
+ - CUDA_VISIBLE_DEVICES=1
60
+ - VLLM_DP_MASTER_PORT=45012
61
+ - HF_HOME=/root/.cache/huggingface
62
+ - HF_TOKEN=${HF_TOKEN:-}
63
+ command: >
64
+ --model Anserwise/AWAXIS-Think-31B
65
+ --served-model-name awaxis-31b
66
+ --host 0.0.0.0
67
+ --port 8022
68
+ --tensor-parallel-size 1
69
+ --max-model-len 16384
70
+ --dtype bfloat16
71
+ --quantization fp8
72
+ --trust-remote-code
73
+ --enforce-eager
74
+ --limit-mm-per-prompt {"image":0,"video":0}
75
+ --gpu-memory-utilization 0.85
76
+ volumes:
77
+ - hf_cache:/root/.cache/huggingface
78
+ ports:
79
+ - "8022:8022"
80
+ deploy:
81
+ resources:
82
+ reservations:
83
+ devices:
84
+ - driver: nvidia
85
+ count: 1
86
+ capabilities: [gpu]
87
+ healthcheck:
88
+ test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8022/v1/models"]
89
+ interval: 20s
90
+ timeout: 5s
91
+ retries: 60
92
+
93
+ gateway:
94
+ image: python:3.11-slim
95
+ container_name: darwin-60b-duo-gateway
96
+ working_dir: /app
97
+ command: >
98
+ bash -c "pip install -q -r requirements.txt &&
99
+ python server.py --host 0.0.0.0 --port 8000
100
+ --darwin-url http://vllm-darwin:8021/v1
101
+ --awaxis-url http://vllm-awaxis:8022/v1"
102
+ volumes:
103
+ - ../gateway:/app
104
+ ports:
105
+ - "8000:8000"
106
+ depends_on:
107
+ vllm-darwin:
108
+ condition: service_healthy
109
+ vllm-awaxis:
110
+ condition: service_healthy
111
+ restart: unless-stopped
112
+
113
+ volumes:
114
+ hf_cache:
115
+ driver: local
gateway/ensemble.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Darwin-60B-DUO Ensemble V_1 — MAJ@N self-consistency + cross-verification.
4
+
5
+ For MCQ / short-answer queries:
6
+ 1) Each backend produces N samples at temperature τ (default 0.7)
7
+ 2) Each backend's answer = its own majority vote (RSA / self-consistency)
8
+ 3) If both majorities agree → return that answer
9
+ 4) If they disagree → each backend verifies the pair (cross-verification)
10
+ and the gateway picks the tournament winner
11
+ 5) Tiebreaker on split verdicts: majority-vote-count confidence
12
+ """
13
+ import asyncio
14
+ import re
15
+ from collections import Counter
16
+ from typing import Any, Dict, List, Optional, Tuple
17
+
18
+
19
+ _LETTERS = "ABCD"
20
+
21
+
22
+ def _extract_letter(text: str) -> str:
23
+ """Extract A/B/C/D letter answer from a free-form response."""
24
+ if not text:
25
+ return ""
26
+ # Strip CoT / thinking tags
27
+ cleaned = re.sub(r"<\|START_THINKING\|>.*?<\|END_THINKING\|>", "", text, flags=re.S)
28
+ cleaned = re.sub(r"<think>.*?</think>", "", cleaned, flags=re.S)
29
+ for tag in ["<|END_THINKING|>", "</think>", "<|START_RESPONSE|>", "<|END_RESPONSE|>"]:
30
+ if tag in cleaned:
31
+ cleaned = cleaned.split(tag)[-1]
32
+ # Common answer patterns
33
+ patterns = [
34
+ r"ANSWER:\s*\(?([A-D])\)?",
35
+ r"\\boxed\{\s*\(?([A-D])\)?\s*\}",
36
+ r"final answer\s*(?:is|:)?\s*\(?([A-D])\)?",
37
+ r"answer\s+is\s*\(?([A-D])\)?",
38
+ r"\(([A-D])\)\s*$",
39
+ ]
40
+ for p in patterns:
41
+ m = re.search(p, cleaned, re.I | re.M)
42
+ if m:
43
+ return m.group(1).upper()
44
+ # Fallback: last A-D token
45
+ candidates = re.findall(r"\b([A-D])\b", cleaned)
46
+ return candidates[-1].upper() if candidates else ""
47
+
48
+
49
+ def _majority(letters: List[str]) -> Tuple[Optional[str], Dict[str, int]]:
50
+ valid = [l for l in letters if l in _LETTERS]
51
+ if not valid:
52
+ return None, {}
53
+ counter = Counter(valid)
54
+ top, _ = counter.most_common(1)[0]
55
+ return top, dict(counter)
56
+
57
+
58
+ _VERIFY_TEMPLATE = (
59
+ "You are a graduate-level expert verifier. Given the following multiple-"
60
+ "choice question and two candidate letter answers, decide which is more "
61
+ "likely correct.\n\n"
62
+ "QUESTION:\n{question}\n\n"
63
+ "CANDIDATE 1 says answer = {a1}\n"
64
+ "CANDIDATE 2 says answer = {a2}\n\n"
65
+ "Think briefly, then respond with exactly one line:\n"
66
+ "VERDICT: 1 (if candidate 1's letter is correct)\n"
67
+ "VERDICT: 2 (if candidate 2's letter is correct)"
68
+ )
69
+
70
+
71
+ def _parse_verdict(text: str) -> Optional[int]:
72
+ m = re.search(r"VERDICT:\s*([12])", text)
73
+ return int(m.group(1)) if m else None
74
+
75
+
76
+ def _last_user_text(messages: List[Dict[str, str]]) -> str:
77
+ for m in reversed(messages):
78
+ if m.get("role") == "user":
79
+ return m.get("content", "")
80
+ return ""
81
+
82
+
83
+ async def ensemble_v1(
84
+ darwin,
85
+ awaxis,
86
+ messages: List[Dict[str, str]],
87
+ temperature: float = 0.7,
88
+ max_tokens: int = 4096,
89
+ n_rsa: int = 8,
90
+ ) -> str:
91
+ """
92
+ Run V_1 ensemble. Returns the final answer string formatted as
93
+ "ANSWER: X" so downstream tooling can parse uniformly.
94
+ """
95
+ # --- Phase 1: parallel RSA (each backend N samples) ---
96
+ d_task = darwin.chat(messages, temperature=temperature, max_tokens=max_tokens, n=n_rsa)
97
+ a_task = awaxis.chat(messages, temperature=temperature, max_tokens=max_tokens, n=n_rsa)
98
+ d_outs, a_outs = await asyncio.gather(d_task, a_task)
99
+
100
+ d_letters = [_extract_letter(o) for o in d_outs]
101
+ a_letters = [_extract_letter(o) for o in a_outs]
102
+ d_maj, d_votes = _majority(d_letters)
103
+ a_maj, a_votes = _majority(a_letters)
104
+
105
+ # --- Phase 2: agreement check ---
106
+ if d_maj is None and a_maj is None:
107
+ return "ANSWER: (no valid answer extracted)"
108
+ if d_maj is None:
109
+ return f"ANSWER: {a_maj}"
110
+ if a_maj is None:
111
+ return f"ANSWER: {d_maj}"
112
+ if d_maj == a_maj:
113
+ return f"ANSWER: {d_maj}"
114
+
115
+ # --- Phase 3: cross-verification on mismatch ---
116
+ question = _last_user_text(messages)
117
+ verify_prompt = _VERIFY_TEMPLATE.format(question=question, a1=d_maj, a2=a_maj)
118
+ verify_msgs = [{"role": "user", "content": verify_prompt}]
119
+
120
+ d_verify_task = darwin.chat(verify_msgs, temperature=0.0, max_tokens=2048, n=1)
121
+ a_verify_task = awaxis.chat(verify_msgs, temperature=0.0, max_tokens=2048, n=1)
122
+ d_verify_outs, a_verify_outs = await asyncio.gather(d_verify_task, a_verify_task)
123
+ d_verdict = _parse_verdict(d_verify_outs[0])
124
+ a_verdict = _parse_verdict(a_verify_outs[0])
125
+
126
+ # --- Phase 4: combine verdicts ---
127
+ if d_verdict == a_verdict and d_verdict is not None:
128
+ return f"ANSWER: {d_maj if d_verdict == 1 else a_maj}"
129
+ if d_verdict is None and a_verdict is None:
130
+ # Fall back to confidence (higher own-vote count wins)
131
+ d_conf = d_votes.get(d_maj, 0)
132
+ a_conf = a_votes.get(a_maj, 0)
133
+ return f"ANSWER: {d_maj if d_conf >= a_conf else a_maj}"
134
+ if d_verdict is None:
135
+ return f"ANSWER: {d_maj if a_verdict == 1 else a_maj}"
136
+ if a_verdict is None:
137
+ return f"ANSWER: {d_maj if d_verdict == 1 else a_maj}"
138
+ # Split — confidence tiebreaker
139
+ d_conf = d_votes.get(d_maj, 0)
140
+ a_conf = a_votes.get(a_maj, 0)
141
+ return f"ANSWER: {d_maj if d_conf >= a_conf else a_maj}"
gateway/refine.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Darwin-60B-DUO Sequential Refine — two-model collaboration.
4
+
5
+ drafter_backend produces the initial draft, then refiner_backend polishes it.
6
+ The polish prompt is built dynamically based on the language combination so
7
+ that:
8
+ - Darwin (English reasoning) → AWAXIS (Korean polish) for Korean output
9
+ requiring rigorous English/STEM reasoning
10
+ - AWAXIS (Korean cultural context) → Darwin (English polish) for English
11
+ output requiring Korean cultural / linguistic context
12
+ """
13
+ import re
14
+ from typing import Any, Dict, List
15
+
16
+
17
+ def _last_user_text(messages: List[Dict[str, str]]) -> str:
18
+ for m in reversed(messages):
19
+ if m.get("role") == "user":
20
+ return m.get("content", "")
21
+ return ""
22
+
23
+
24
+ def _korean_ratio(text: str) -> float:
25
+ if not text:
26
+ return 0.0
27
+ return len(re.findall(r"[가-힣]", text)) / len(text)
28
+
29
+
30
+ async def sequential_refine(
31
+ drafter,
32
+ refiner,
33
+ messages: List[Dict[str, str]],
34
+ temperature: float = 0.5,
35
+ max_tokens: int = 4096,
36
+ ) -> str:
37
+ """
38
+ Step 1: drafter produces the initial answer using the user's messages.
39
+ Step 2: refiner is given the original messages + the drafter's response +
40
+ a polish instruction, then produces the final output.
41
+
42
+ The polish instruction is language-adaptive:
43
+ - If user asked in Korean (kr_ratio > 0.3) → polish to natural Korean
44
+ - If user asked in English → polish to clearer English
45
+ - Otherwise → general clarity polish
46
+ """
47
+ user_text = _last_user_text(messages)
48
+ kr = _korean_ratio(user_text)
49
+
50
+ # ---- Step 1: drafter ----
51
+ draft_outputs = await drafter.chat(
52
+ messages,
53
+ temperature=temperature,
54
+ max_tokens=max_tokens,
55
+ )
56
+ draft = draft_outputs[0]
57
+
58
+ # ---- Step 2: refiner polish ----
59
+ if kr > 0.3:
60
+ polish_instruction = (
61
+ "위 초안을 사용자의 원래 질문 의도에 맞게 한국어로 자연스럽고 "
62
+ "정확하게 다듬어 최종 답변을 작성하세요. 사실관계는 보존하되, "
63
+ "어색한 표현·번역체·중복은 제거하고, 한국어 독자에게 매끄러운 "
64
+ "흐름이 되도록 재작성하세요. 새로운 정보 추가 금지 — 표현만 정련하세요."
65
+ )
66
+ elif kr < 0.05 and len(user_text) > 0:
67
+ polish_instruction = (
68
+ "Polish the draft above into a clearer, more concise, and "
69
+ "natural-sounding English response that fully addresses the "
70
+ "user's original question. Preserve all factual content; remove "
71
+ "redundancy, awkward phrasing, and translation artifacts. Do "
72
+ "not add new information — refine wording only."
73
+ )
74
+ else:
75
+ polish_instruction = (
76
+ "Refine the draft above for clarity, naturalness, and "
77
+ "consistency. Preserve all facts; remove redundancy. Do not "
78
+ "introduce new information."
79
+ )
80
+
81
+ refine_messages = list(messages) + [
82
+ {"role": "assistant", "content": draft},
83
+ {"role": "user", "content": polish_instruction},
84
+ ]
85
+ refined_outputs = await refiner.chat(
86
+ refine_messages,
87
+ temperature=max(0.0, temperature - 0.2), # cooler for polish
88
+ max_tokens=max_tokens,
89
+ )
90
+ return refined_outputs[0]
gateway/requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ fastapi>=0.110
2
+ uvicorn[standard]>=0.27
3
+ httpx>=0.27
4
+ pydantic>=2.6
gateway/router.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Darwin-60B-DUO Router — language + domain + complexity classification.
4
+
5
+ Returns a RouteDecision indicating which Hybrid-A strategy to invoke:
6
+ - "route_darwin" : English-only single backend
7
+ - "route_awaxis" : Korean-only single backend
8
+ - "split_refine" : Darwin reasons → AWAXIS polishes (Korean output, English reasoning)
9
+ - "split_refine_reverse" : AWAXIS retrieves → Darwin polishes (English output, Korean context)
10
+ - "ensemble_v1" : MCQ / short answer requiring cross-verification
11
+ """
12
+ import re
13
+ from dataclasses import dataclass
14
+ from typing import Optional
15
+
16
+
17
+ # ---------------------------------------------------------------------------
18
+ # Heuristic keyword lists
19
+ # ---------------------------------------------------------------------------
20
+ ENGLISH_REASONING_KEYWORDS = {
21
+ # Math
22
+ "prove", "theorem", "derivative", "integral", "equation", "matrix",
23
+ "vector", "topology", "manifold",
24
+ # Code
25
+ "def ", "function ", "import ", "class ", "return ", "lambda ",
26
+ "javascript", "python", "rust", "golang", "typescript", "regex",
27
+ # Sci-tech
28
+ "gradient", "tensor", "embedding", "transformer", "attention",
29
+ "rlhf", "rlvr", "quantization", "kernel",
30
+ # Markers
31
+ r"\\boxed", r"\\frac", r"\\sum", r"\\int", "<eqn>", "$$",
32
+ }
33
+
34
+ KOREAN_CULTURAL_KEYWORDS = {
35
+ "추석", "설날", "한국", "조선", "고려", "신라", "백제",
36
+ "k-pop", "케이팝", "한복", "김치", "한국어",
37
+ "공무원", "정부", "과기부", "교육부", "외교부",
38
+ "국회", "정책", "법안", "조례",
39
+ }
40
+
41
+ MCQ_PATTERNS = [
42
+ r"\(A\).*\(B\).*\(C\).*\(D\)",
43
+ r"^\s*A\..*\n\s*B\..*\n\s*C\.",
44
+ r"answer.*[A-D]",
45
+ r"정답.*[ABCD가나다라]",
46
+ r"\bANSWER:",
47
+ ]
48
+
49
+
50
+ @dataclass
51
+ class RouteDecision:
52
+ strategy: str
53
+ reason: str
54
+ korean_ratio: float = 0.0
55
+ english_ratio: float = 0.0
56
+ has_reasoning_marker: bool = False
57
+ has_korean_cultural_marker: bool = False
58
+ is_mcq: bool = False
59
+
60
+
61
+ # ---------------------------------------------------------------------------
62
+ # Detection primitives
63
+ # ---------------------------------------------------------------------------
64
+ def korean_ratio(text: str) -> float:
65
+ """Fraction of Hangul characters."""
66
+ if not text:
67
+ return 0.0
68
+ total = len(text)
69
+ hangul = len(re.findall(r"[가-힣]", text))
70
+ return hangul / total if total > 0 else 0.0
71
+
72
+
73
+ def english_ratio(text: str) -> float:
74
+ """Fraction of ASCII alphabetic characters."""
75
+ if not text:
76
+ return 0.0
77
+ total = len(text)
78
+ alpha = len(re.findall(r"[a-zA-Z]", text))
79
+ return alpha / total if total > 0 else 0.0
80
+
81
+
82
+ def has_reasoning_marker(text: str) -> bool:
83
+ """English STEM / coding keywords or math markers."""
84
+ lower = text.lower()
85
+ for kw in ENGLISH_REASONING_KEYWORDS:
86
+ # Some keywords are regex patterns (start with backslash)
87
+ if kw.startswith("\\"):
88
+ if re.search(re.escape(kw), text):
89
+ return True
90
+ elif kw in lower:
91
+ return True
92
+ return False
93
+
94
+
95
+ def has_korean_cultural_marker(text: str) -> bool:
96
+ lower = text.lower()
97
+ return any(kw in lower for kw in KOREAN_CULTURAL_KEYWORDS)
98
+
99
+
100
+ def is_mcq(text: str) -> bool:
101
+ for pat in MCQ_PATTERNS:
102
+ if re.search(pat, text, re.IGNORECASE | re.MULTILINE):
103
+ return True
104
+ return False
105
+
106
+
107
+ # ---------------------------------------------------------------------------
108
+ # Strategy selector — Hybrid-A
109
+ # ---------------------------------------------------------------------------
110
+ def select_strategy(text: str) -> RouteDecision:
111
+ """
112
+ Hybrid-A strategy decision:
113
+ 1) MCQ-style short answer → ensemble_v1
114
+ 2) Korean output + English/STEM reasoning needed → split_refine
115
+ 3) English output + Korean cultural context needed → split_refine_reverse
116
+ 4) Korean-dominant → route_awaxis
117
+ 5) English-dominant → route_darwin
118
+ 6) Mixed default → route_awaxis (Korean-first preference)
119
+ """
120
+ kr = korean_ratio(text)
121
+ en = english_ratio(text)
122
+ reasoning = has_reasoning_marker(text)
123
+ cultural = has_korean_cultural_marker(text)
124
+ mcq = is_mcq(text)
125
+
126
+ decision = RouteDecision(
127
+ strategy="route_awaxis", # default
128
+ reason="default",
129
+ korean_ratio=round(kr, 3),
130
+ english_ratio=round(en, 3),
131
+ has_reasoning_marker=reasoning,
132
+ has_korean_cultural_marker=cultural,
133
+ is_mcq=mcq,
134
+ )
135
+
136
+ # 1. MCQ — always ensemble (10% case)
137
+ if mcq and len(text) < 4000:
138
+ decision.strategy = "ensemble_v1"
139
+ decision.reason = "mcq_short_answer"
140
+ return decision
141
+
142
+ # 2. Korean output + reasoning required (15% case)
143
+ if kr > 0.3 and reasoning:
144
+ decision.strategy = "split_refine"
145
+ decision.reason = "korean_output_with_english_reasoning"
146
+ return decision
147
+
148
+ # 3. English output + Korean cultural context (5% case)
149
+ if en > 0.5 and kr < 0.05 and cultural:
150
+ decision.strategy = "split_refine_reverse"
151
+ decision.reason = "english_output_with_korean_context"
152
+ return decision
153
+
154
+ # 4. Korean-dominant (50% case)
155
+ if kr >= 0.3:
156
+ decision.strategy = "route_awaxis"
157
+ decision.reason = "korean_dominant"
158
+ return decision
159
+
160
+ # 5. English-dominant (20% case)
161
+ if en >= 0.5 and kr < 0.05:
162
+ decision.strategy = "route_darwin"
163
+ decision.reason = "english_dominant"
164
+ return decision
165
+
166
+ # 6. Mixed / ambiguous → AWAXIS (Korean-first default)
167
+ decision.strategy = "route_awaxis"
168
+ decision.reason = "mixed_fallback_korean"
169
+ return decision
170
+
171
+
172
+ # ---------------------------------------------------------------------------
173
+ # Smoke test
174
+ # ---------------------------------------------------------------------------
175
+ if __name__ == "__main__":
176
+ samples = [
177
+ ("순수 한국어 채팅", "안녕하세요. 오늘 날씨가 어떤가요?"),
178
+ ("순수 영어 코드", "def fib(n):\n return n if n < 2 else fib(n-1) + fib(n-2)"),
179
+ ("한국어 + 영어 reasoning", "Transformer attention의 작동 원리를 한국어로 설명해줘"),
180
+ ("영어 + 한국 문화", "Explain the Korean Chuseok holiday in simple English."),
181
+ ("MCQ", "Which is correct?\n(A) foo\n(B) bar\n(C) baz\n(D) qux"),
182
+ ("한국어 MCQ", "정답은 무엇인가요? A. 1 B. 2 C. 3 D. 4"),
183
+ ]
184
+ for name, txt in samples:
185
+ d = select_strategy(txt)
186
+ print(f"[{name}] -> {d.strategy} ({d.reason}) kr={d.korean_ratio} en={d.english_ratio}")
gateway/server.py ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ Darwin-60B-DUO Gateway — FastAPI OpenAI-compatible orchestrator.
4
+
5
+ Exposes a single OpenAI-compatible endpoint ("darwin-60b-duo") that
6
+ internally routes to two backends:
7
+ - Darwin-28B-REASON (English reasoning specialist, HF GPQA Diamond #3)
8
+ - AWAXIS-Think-31B (Korean specialist, K-AI Leaderboard #1)
9
+
10
+ Hybrid-A strategy (config.json):
11
+ - 70% Route (single backend)
12
+ - 20% Split / Refine (sequential two-model collaboration)
13
+ - 10% Ensemble V_1 (cross-verification tournament for MCQ / short answers)
14
+
15
+ Run:
16
+ pip install -r requirements.txt
17
+ python server.py --port 8000 \\
18
+ --darwin-url http://127.0.0.1:8021/v1 \\
19
+ --awaxis-url http://127.0.0.1:8022/v1
20
+
21
+ License: Gemma (combined-license inheritance — see README).
22
+ """
23
+ import argparse
24
+ import asyncio
25
+ import json
26
+ import time
27
+ import uuid
28
+ from typing import Any, Dict, List, Optional
29
+
30
+ import httpx
31
+ from fastapi import FastAPI, HTTPException
32
+ from fastapi.responses import JSONResponse, StreamingResponse
33
+ from pydantic import BaseModel, Field
34
+
35
+ from router import select_strategy, RouteDecision
36
+ from refine import sequential_refine
37
+ from ensemble import ensemble_v1
38
+
39
+
40
+ # ---------------------------------------------------------------------------
41
+ # Pydantic models — OpenAI Chat Completions API subset
42
+ # ---------------------------------------------------------------------------
43
+ class ChatMessage(BaseModel):
44
+ role: str
45
+ content: str
46
+
47
+
48
+ class ChatCompletionRequest(BaseModel):
49
+ model: str = "darwin-60b-duo"
50
+ messages: List[ChatMessage]
51
+ temperature: float = 0.7
52
+ top_p: float = 0.95
53
+ max_tokens: int = 4096
54
+ n: int = 1
55
+ stream: bool = False
56
+ # Optional: force a specific strategy ("route_darwin", "route_awaxis",
57
+ # "split_refine", "ensemble_v1", "auto"). Default "auto" = Hybrid-A router.
58
+ duo_strategy: Optional[str] = "auto"
59
+
60
+
61
+ # ---------------------------------------------------------------------------
62
+ # Backend HTTP client
63
+ # ---------------------------------------------------------------------------
64
+ class Backend:
65
+ def __init__(self, name: str, base_url: str, served_name: str):
66
+ self.name = name
67
+ self.base_url = base_url.rstrip("/")
68
+ self.served_name = served_name
69
+ self.client = httpx.AsyncClient(timeout=httpx.Timeout(900.0))
70
+
71
+ async def chat(
72
+ self,
73
+ messages: List[Dict[str, str]],
74
+ temperature: float = 0.7,
75
+ max_tokens: int = 4096,
76
+ n: int = 1,
77
+ top_p: float = 0.95,
78
+ ) -> List[str]:
79
+ payload = {
80
+ "model": self.served_name,
81
+ "messages": messages,
82
+ "temperature": temperature,
83
+ "top_p": top_p,
84
+ "max_tokens": max_tokens,
85
+ "n": n,
86
+ }
87
+ r = await self.client.post(
88
+ f"{self.base_url}/chat/completions", json=payload
89
+ )
90
+ r.raise_for_status()
91
+ data = r.json()
92
+ return [c["message"]["content"] for c in data["choices"]]
93
+
94
+ async def health(self) -> bool:
95
+ try:
96
+ r = await self.client.get(f"{self.base_url}/models", timeout=5)
97
+ return r.status_code == 200
98
+ except Exception:
99
+ return False
100
+
101
+
102
+ # ---------------------------------------------------------------------------
103
+ # FastAPI app
104
+ # ---------------------------------------------------------------------------
105
+ app = FastAPI(
106
+ title="Darwin-60B-DUO Gateway",
107
+ version="1.0.0",
108
+ description=(
109
+ "Single OpenAI-compatible endpoint for the Darwin-60B-DUO "
110
+ "(Darwin-28B-REASON + AWAXIS-Think-31B). Hybrid-A routing."
111
+ ),
112
+ )
113
+
114
+ # Initialized via CLI args at startup
115
+ DARWIN: Optional[Backend] = None
116
+ AWAXIS: Optional[Backend] = None
117
+
118
+
119
+ @app.get("/v1/models")
120
+ async def list_models():
121
+ """Expose only the aggregate model to external callers."""
122
+ return {
123
+ "object": "list",
124
+ "data": [
125
+ {
126
+ "id": "darwin-60b-duo",
127
+ "object": "model",
128
+ "owned_by": "FINAL-Bench",
129
+ "created": int(time.time()),
130
+ }
131
+ ],
132
+ }
133
+
134
+
135
+ @app.get("/health")
136
+ async def health():
137
+ d_ok = await DARWIN.health() if DARWIN else False
138
+ a_ok = await AWAXIS.health() if AWAXIS else False
139
+ status = "ok" if (d_ok and a_ok) else "degraded"
140
+ return {
141
+ "status": status,
142
+ "backends": {
143
+ "darwin-28r": d_ok,
144
+ "awaxis-31b": a_ok,
145
+ },
146
+ "gateway_version": "1.0.0",
147
+ }
148
+
149
+
150
+ def _build_response(content: str, route_meta: Dict[str, Any]) -> Dict[str, Any]:
151
+ """Build an OpenAI-compatible Chat Completion response with route metadata."""
152
+ return {
153
+ "id": f"chatcmpl-{uuid.uuid4().hex[:24]}",
154
+ "object": "chat.completion",
155
+ "created": int(time.time()),
156
+ "model": "darwin-60b-duo",
157
+ "choices": [
158
+ {
159
+ "index": 0,
160
+ "message": {
161
+ "role": "assistant",
162
+ "content": content,
163
+ },
164
+ "finish_reason": "stop",
165
+ }
166
+ ],
167
+ "usage": {
168
+ "prompt_tokens": -1, # Aggregate gateway does not track tokens
169
+ "completion_tokens": -1,
170
+ "total_tokens": -1,
171
+ },
172
+ # Non-standard metadata for transparency / debugging
173
+ "_duo_route": route_meta,
174
+ }
175
+
176
+
177
+ @app.post("/v1/chat/completions")
178
+ async def chat_completions(req: ChatCompletionRequest):
179
+ if not req.messages:
180
+ raise HTTPException(400, "messages must not be empty")
181
+
182
+ user_text = req.messages[-1].content
183
+ messages_dict = [m.dict() for m in req.messages]
184
+
185
+ # ----- Strategy selection -----
186
+ if req.duo_strategy and req.duo_strategy != "auto":
187
+ decision = RouteDecision(strategy=req.duo_strategy, reason="user_forced")
188
+ else:
189
+ decision = select_strategy(user_text)
190
+
191
+ t0 = time.time()
192
+
193
+ # ----- Execute -----
194
+ try:
195
+ if decision.strategy == "route_darwin":
196
+ outputs = await DARWIN.chat(
197
+ messages_dict,
198
+ temperature=req.temperature,
199
+ max_tokens=req.max_tokens,
200
+ top_p=req.top_p,
201
+ )
202
+ content = outputs[0]
203
+
204
+ elif decision.strategy == "route_awaxis":
205
+ outputs = await AWAXIS.chat(
206
+ messages_dict,
207
+ temperature=req.temperature,
208
+ max_tokens=req.max_tokens,
209
+ top_p=req.top_p,
210
+ )
211
+ content = outputs[0]
212
+
213
+ elif decision.strategy == "split_refine":
214
+ # Darwin reasons in English → AWAXIS polishes in Korean
215
+ content = await sequential_refine(
216
+ DARWIN, AWAXIS, messages_dict,
217
+ temperature=req.temperature, max_tokens=req.max_tokens
218
+ )
219
+
220
+ elif decision.strategy == "split_refine_reverse":
221
+ # AWAXIS retrieves Korean context → Darwin polishes in English
222
+ content = await sequential_refine(
223
+ AWAXIS, DARWIN, messages_dict,
224
+ temperature=req.temperature, max_tokens=req.max_tokens
225
+ )
226
+
227
+ elif decision.strategy == "ensemble_v1":
228
+ # MCQ / short answer: MAJ@N per model + cross-verify if mismatched
229
+ content = await ensemble_v1(
230
+ DARWIN, AWAXIS, messages_dict,
231
+ temperature=req.temperature, max_tokens=req.max_tokens,
232
+ n_rsa=8,
233
+ )
234
+
235
+ else:
236
+ # Fallback: AWAXIS (default for ambiguous / mixed)
237
+ outputs = await AWAXIS.chat(
238
+ messages_dict, temperature=req.temperature,
239
+ max_tokens=req.max_tokens, top_p=req.top_p,
240
+ )
241
+ content = outputs[0]
242
+ decision.strategy = "fallback_awaxis"
243
+
244
+ except httpx.HTTPError as e:
245
+ raise HTTPException(503, f"backend error: {type(e).__name__}: {e}")
246
+
247
+ elapsed = time.time() - t0
248
+ route_meta = {
249
+ "strategy": decision.strategy,
250
+ "reason": decision.reason,
251
+ "elapsed_s": round(elapsed, 2),
252
+ "language_ratio": decision.korean_ratio,
253
+ }
254
+
255
+ return JSONResponse(_build_response(content, route_meta))
256
+
257
+
258
+ # ---------------------------------------------------------------------------
259
+ # CLI
260
+ # ---------------------------------------------------------------------------
261
+ def main():
262
+ p = argparse.ArgumentParser()
263
+ p.add_argument("--host", default="0.0.0.0")
264
+ p.add_argument("--port", type=int, default=8000)
265
+ p.add_argument(
266
+ "--darwin-url", default="http://127.0.0.1:8021/v1",
267
+ help="Darwin-28B-REASON vLLM endpoint",
268
+ )
269
+ p.add_argument(
270
+ "--awaxis-url", default="http://127.0.0.1:8022/v1",
271
+ help="AWAXIS-Think-31B vLLM endpoint",
272
+ )
273
+ p.add_argument("--darwin-served-name", default="darwin-28r")
274
+ p.add_argument("--awaxis-served-name", default="awaxis-31b")
275
+ args = p.parse_args()
276
+
277
+ global DARWIN, AWAXIS
278
+ DARWIN = Backend("darwin-28r", args.darwin_url, args.darwin_served_name)
279
+ AWAXIS = Backend("awaxis-31b", args.awaxis_url, args.awaxis_served_name)
280
+
281
+ import uvicorn
282
+ uvicorn.run(app, host=args.host, port=args.port, log_level="info")
283
+
284
+
285
+ if __name__ == "__main__":
286
+ main()
tokenizer_info.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_note": "Darwin-60B-DUO uses constituent tokenizers via gateway, not a unified one.",
3
+ "constituent_tokenizers": {
4
+ "darwin-28r": {
5
+ "source_model": "FINAL-Bench/Darwin-28B-REASON",
6
+ "tokenizer_family": "qwen3_5",
7
+ "vocab_size_estimate": 151936
8
+ },
9
+ "awaxis-31b": {
10
+ "source_model": "Anserwise/AWAXIS-Think-31B",
11
+ "tokenizer_family": "gemma4",
12
+ "vocab_size_estimate": 262144
13
+ }
14
+ },
15
+ "routing_decision_layer": "language detection + domain classification (gateway/router.py) performs tokenization-free routing on the raw text before backend selection",
16
+ "downstream_token_handling": "Each backend (vLLM serving the respective base model) handles its own tokenization. The gateway operates on text strings, not token IDs."
17
+ }