Text Generation
Transformers
Safetensors
English
Korean
multilingual
darwin
darwin-family
darwin-duo
duo
ensemble
mixture-of-models
router
korean
reasoning
finalbench
vidraft
Eval Results (legacy)
Eval Results
Instructions to use FINAL-Bench/Darwin-60B-DUO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FINAL-Bench/Darwin-60B-DUO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-60B-DUO")# Load model directly from transformers import DarwinDuoOrchestrator model = DarwinDuoOrchestrator.from_pretrained("FINAL-Bench/Darwin-60B-DUO", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use FINAL-Bench/Darwin-60B-DUO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FINAL-Bench/Darwin-60B-DUO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-60B-DUO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/FINAL-Bench/Darwin-60B-DUO
- SGLang
How to use FINAL-Bench/Darwin-60B-DUO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-60B-DUO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-60B-DUO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-60B-DUO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-60B-DUO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use FINAL-Bench/Darwin-60B-DUO with Docker Model Runner:
docker model run hf.co/FINAL-Bench/Darwin-60B-DUO
Initial release — Darwin-60B-DUO (Hybrid-A: Route 70% / Split-Refine 20% / Ensemble V_1 10%)
Browse files- LICENSE +54 -0
- README.md +319 -0
- benchmarks/README.md +25 -0
- config.json +68 -0
- docker/docker-compose.yml +115 -0
- gateway/ensemble.py +141 -0
- gateway/refine.py +90 -0
- gateway/requirements.txt +4 -0
- gateway/router.py +186 -0
- gateway/server.py +286 -0
- tokenizer_info.json +17 -0
LICENSE
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Darwin-60B-DUO — Combined License Notice
|
| 2 |
+
========================================
|
| 3 |
+
|
| 4 |
+
This repository aggregates two constituent base models, each governed by its
|
| 5 |
+
own license. The combined repository inherits the more restrictive of the
|
| 6 |
+
two — the Gemma license — as the effective deployment license.
|
| 7 |
+
|
| 8 |
+
────────────────────────────────────────────────────────────────────────────
|
| 9 |
+
1. Constituent base model licenses
|
| 10 |
+
────────────────────────────────────────────────────────────────────────────
|
| 11 |
+
|
| 12 |
+
- Darwin-28B-REASON (FINAL-Bench/Darwin-28B-REASON)
|
| 13 |
+
License: Apache License 2.0
|
| 14 |
+
Source : https://www.apache.org/licenses/LICENSE-2.0
|
| 15 |
+
|
| 16 |
+
- AWAXIS-Think-31B (Anserwise/AWAXIS-Think-31B)
|
| 17 |
+
License: Gemma Terms of Use (inherited from Google Gemma-4 base)
|
| 18 |
+
Source : https://ai.google.dev/gemma/terms
|
| 19 |
+
|
| 20 |
+
────────────────────────────────────────────────────────────────────────────
|
| 21 |
+
2. Effective combined license for Darwin-60B-DUO
|
| 22 |
+
────────────────────────────────────────────────────────────────────────────
|
| 23 |
+
|
| 24 |
+
Because the Gemma Terms of Use impose more specific restrictions than
|
| 25 |
+
Apache-2.0 (notably the Gemma Prohibited Use Policy), the combined
|
| 26 |
+
Darwin-60B-DUO release is distributed under the **Gemma Terms of Use**.
|
| 27 |
+
|
| 28 |
+
Users intending commercial deployment must:
|
| 29 |
+
- Comply with the Gemma Terms of Use in full
|
| 30 |
+
https://ai.google.dev/gemma/terms
|
| 31 |
+
- Comply with the Gemma Prohibited Use Policy
|
| 32 |
+
https://ai.google.dev/gemma/prohibited_use_policy
|
| 33 |
+
- Retain all attribution and notices for both constituent models
|
| 34 |
+
|
| 35 |
+
────────────────────────────────────────────────────────────────────────────
|
| 36 |
+
3. Gateway code (this repository's `gateway/`, `docker/`, etc.)
|
| 37 |
+
────────────────────────────────────────────────────────────────────────────
|
| 38 |
+
|
| 39 |
+
The orchestration code authored for Darwin-60B-DUO (FastAPI gateway,
|
| 40 |
+
router, refine, ensemble, Docker compose) is released under
|
| 41 |
+
Apache License 2.0 to maximize developer flexibility. The combined
|
| 42 |
+
license inheritance applies only to the served model behaviour, not the
|
| 43 |
+
code that orchestrates it.
|
| 44 |
+
|
| 45 |
+
────────────────────────────────────────────────────────────────────────────
|
| 46 |
+
4. Disclaimer
|
| 47 |
+
────────────────────────────────────────────────────────────────────────────
|
| 48 |
+
|
| 49 |
+
This document is a license summary for end-user convenience. In case of
|
| 50 |
+
any conflict, the original license texts of the constituent models
|
| 51 |
+
(Apache-2.0 and Gemma Terms of Use) govern. Users should consult those
|
| 52 |
+
authoritative sources for binding obligations.
|
| 53 |
+
|
| 54 |
+
Copyright (c) 2026 FINAL-Bench, VIDRAFT, Anserwise.
|
README.md
ADDED
|
@@ -0,0 +1,319 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gemma
|
| 3 |
+
language:
|
| 4 |
+
- ko
|
| 5 |
+
- en
|
| 6 |
+
- multilingual
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
tags:
|
| 10 |
+
- darwin
|
| 11 |
+
- darwin-family
|
| 12 |
+
- darwin-duo
|
| 13 |
+
- duo
|
| 14 |
+
- ensemble
|
| 15 |
+
- mixture-of-models
|
| 16 |
+
- router
|
| 17 |
+
- korean
|
| 18 |
+
- reasoning
|
| 19 |
+
- finalbench
|
| 20 |
+
- vidraft
|
| 21 |
+
base_model:
|
| 22 |
+
- FINAL-Bench/Darwin-28B-REASON
|
| 23 |
+
- Anserwise/AWAXIS-Think-31B
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
<div align="center">
|
| 27 |
+
|
| 28 |
+
# 🌳 Darwin-60B-DUO
|
| 29 |
+
|
| 30 |
+
### Darwin family 최초의 듀오 모델 — 두 SOTA가 하나로
|
| 31 |
+
### *The first DUO of the Darwin family — two SOTAs unified into one model*
|
| 32 |
+
|
| 33 |
+
</div>
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## ✨ 한 줄 요약 · TL;DR
|
| 38 |
+
|
| 39 |
+
> **HF 공인 GPQA Diamond 3위** Darwin-28B-REASON과
|
| 40 |
+
> **한국 과기부 K-AI 리더보드 1위** AWAXIS-Think-31B를
|
| 41 |
+
> **단일 OpenAI-호환 endpoint** 로 묶은 **Darwin family 최초 듀오 모델**.
|
| 42 |
+
>
|
| 43 |
+
> *Combines the **#3 HF-verified GPQA Diamond** Darwin-28B-REASON with the
|
| 44 |
+
> **#1 Korean K-AI Leaderboard** AWAXIS-Think-31B behind a **single OpenAI-compatible endpoint** — the Darwin family's first DUO release.*
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## 🏆 두 SOTA 모델 구성 · Two SOTA Constituents
|
| 49 |
+
|
| 50 |
+
| 구성 모델 | 공인 성과 (Verified Rank) | 강점 (Strengths) | 파라미터 |
|
| 51 |
+
|-----------|-----------------------|-----------------|--------|
|
| 52 |
+
| **Darwin-28B-REASON** | 🥉 **Hugging Face 공인 GPQA Diamond 벤치마크 3위** | English graduate-level reasoning · STEM · 수학 · code | 26.9 B |
|
| 53 |
+
| **AWAXIS-Think-31B** | 🥇 **대한민국 과학기술정보통신부 운영 국가 공인 K-AI 리더보드 1위** | 한국어 이해/생성 · 한국 문화 · 자연스러운 어조 | 31.27 B |
|
| 54 |
+
| **Darwin-60B-DUO** (this) | *Aggregate Brand* | 위 두 영역 SOTA 결합 + 자동 hybrid 라우팅 | 58.17 B (≈ 60 B) |
|
| 55 |
+
|
| 56 |
+
> 💡 **AWAXIS-Think-31B 역시 Darwin family 입니다.**
|
| 57 |
+
> Darwin 팀이 Google Gemma-4 base 위에 한국어 specialist 분기로 distill 한 모델로,
|
| 58 |
+
> 기존 Darwin (Qwen3.5 계열) lineage 와 함께 Darwin family 양대 축을 형성합니다.
|
| 59 |
+
>
|
| 60 |
+
> *AWAXIS-Think-31B is also part of the **Darwin family** — a Korean specialist branch distilled by the Darwin team on top of Google's Gemma-4 base, complementing the original Qwen3.5-line Darwin lineage as the family's second axis.*
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## 🎯 무엇이 특별한가 · What Makes It Unique
|
| 65 |
+
|
| 66 |
+
### 1️⃣ 영역별 SOTA를 한 모델에 (Two SOTA Domains in One Model)
|
| 67 |
+
영어 reasoning과 한국어 자연성을 동시에 SOTA 수준으로 달성하는 단일 LLM은 극히 드뭅니다.
|
| 68 |
+
Darwin-60B-DUO는 각 영역 공인 SOTA 두 모델을 **하나의 API endpoint** 로 묶어,
|
| 69 |
+
사용자가 orchestration 을 인식하지 못한 채 두 강점을 동시에 누립니다.
|
| 70 |
+
|
| 71 |
+
*Few single LLMs achieve SOTA in both English reasoning and Korean naturalness simultaneously. Darwin-60B-DUO unifies two domain-verified SOTAs behind one endpoint — users benefit from both without orchestration overhead.*
|
| 72 |
+
|
| 73 |
+
### 2️⃣ 자동 Hybrid 라우팅 (Auto Hybrid Routing — "Hybrid-A")
|
| 74 |
+
입력을 분석하여 **시나리오별로 최적 전략을 자동 선택** 합니다.
|
| 75 |
+
|
| 76 |
+
| 시나리오 (Scenario) | 라우팅 전략 (Strategy) | 호출 모델 | 비용 (Cost) | 비중 (Share) |
|
| 77 |
+
|---------------------|----------------------|----------|------------|------------|
|
| 78 |
+
| 순수 한국어 (Pure Korean) — 이메일, 한국 정보, 채팅 | **Route → AWAXIS** | 1 model | 1× | ~50 % |
|
| 79 |
+
| 순수 영어 (Pure English) — 코드, 수학, 영어 reasoning | **Route → Darwin** | 1 model | 1× | ~20 % |
|
| 80 |
+
| 한국어 답 + 영어/STEM reasoning 필요 (Korean output needing English/STEM reasoning) | **Split → Darwin reasons → AWAXIS polishes** | 2 models, sequential | 2× | ~15 % |
|
| 81 |
+
| 영어 답 + 한국 정보 필요 (English output needing Korean context) | **Split → AWAXIS retrieves → Darwin polishes** | 2 models, sequential | 2× | ~5 % |
|
| 82 |
+
| 객관식·짧은 답 (MCQ / short answer) | **Ensemble V₁ tournament** | 2 models + cross-verify | 2× | ~10 % |
|
| 83 |
+
|
| 84 |
+
**평균 비용 ≈ 1.3 × of a single 30B model**: 70% 케이스는 1×, 30% 케이스만 2×.
|
| 85 |
+
*Average effective cost is roughly 1.3× a single 30B model.*
|
| 86 |
+
|
| 87 |
+
### 3️⃣ 단일 모델 façade (Single-Model Façade)
|
| 88 |
+
**OpenAI API 호환 단일 endpoint.** 기존 도구 (LangChain · LlamaIndex · OpenAI SDK · Continue · Cursor 등)를 코드 변경 없이 그대로 사용합니다.
|
| 89 |
+
|
| 90 |
+
```python
|
| 91 |
+
from openai import OpenAI
|
| 92 |
+
|
| 93 |
+
client = OpenAI(base_url="http://your-server:8000/v1", api_key="anything")
|
| 94 |
+
resp = client.chat.completions.create(
|
| 95 |
+
model="darwin-60b-duo", # 한 모델로 호출 / single model name
|
| 96 |
+
messages=[{"role": "user",
|
| 97 |
+
"content": "GPT-5와 Claude의 reasoning 차이를 한국어로 정리해줘"}],
|
| 98 |
+
)
|
| 99 |
+
print(resp.choices[0].message.content)
|
| 100 |
+
# 내부: Darwin이 영어 reasoning → AWAXIS가 한국어로 다듬어 반환
|
| 101 |
+
# Internally: Darwin reasons in English → AWAXIS polishes in Korean
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
### 4️⃣ 효율적 GPU 운영 (Efficient GPU Footprint)
|
| 105 |
+
- **FP8 양자화** 시 합산 약 **30 GB** → **단일 B200/H100 (80 GB) GPU 1대** 로 충��
|
| 106 |
+
- BF16 운영 시 B200 2대 (각 ~ 60 GB)
|
| 107 |
+
- vLLM 기반 high-throughput inference (텐서 병렬·prefix caching 지원)
|
| 108 |
+
|
| 109 |
+
*With FP8 quantization, the combined footprint (~30 GB) fits on a single B200/H100. BF16 deployment uses two B200 GPUs.*
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
## 🌳 Darwin Family 가족 트리 · Family Tree
|
| 114 |
+
|
| 115 |
+
```
|
| 116 |
+
🌳 Darwin Family
|
| 117 |
+
│
|
| 118 |
+
├─ 👴 GRANDPARENTS (Foundation lineage)
|
| 119 |
+
│ ├─ Cohere Command A+ ── English reasoning lineage (218 B)
|
| 120 |
+
│ └─ Google Gemma-4-31B-it ── Korean/multilingual base
|
| 121 |
+
│
|
| 122 |
+
├─ 👨 PARENTS (Family bases)
|
| 123 |
+
│ ├─ Darwin-9B ── omni-modal, ko-en compact
|
| 124 |
+
│ ├─ Darwin-28B-Opus ── English reasoning base
|
| 125 |
+
│ ├─ Darwin-31B-Opus ── Korean multimodal base
|
| 126 |
+
│ └─ Darwin-218B-Delphi ── cascade flagship (GPQA Diamond 90.91 %)
|
| 127 |
+
│
|
| 128 |
+
├─ 🧒 SPECIALISTS (Children — domain SOTAs)
|
| 129 |
+
│ ├─ Darwin-28B-REASON 🥉 ── HF GPQA Diamond #3 (English reasoning specialist)
|
| 130 |
+
│ └─ AWAXIS-Think-31B 🥇 ── K-AI Leaderboard #1 (Korean specialist, Gemma-4 branch)
|
| 131 |
+
│
|
| 132 |
+
└─ ⭐ Darwin-60B-DUO ⭐ (you are here)
|
| 133 |
+
└─ Two specialists unified — 두 specialist 를 단일 모델로
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
## 🚀 사용법 · Usage
|
| 139 |
+
|
| 140 |
+
### Option A — Docker Compose (권장 / Recommended)
|
| 141 |
+
|
| 142 |
+
```bash
|
| 143 |
+
git clone https://huggingface.co/FINAL-Bench/Darwin-60B-DUO
|
| 144 |
+
cd Darwin-60B-DUO
|
| 145 |
+
docker compose -f docker/docker-compose.yml up -d
|
| 146 |
+
|
| 147 |
+
# 검증 / Verify
|
| 148 |
+
curl http://localhost:8000/v1/models
|
| 149 |
+
# → {"data":[{"id":"darwin-60b-duo","object":"model"}]}
|
| 150 |
+
|
| 151 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 152 |
+
-H "Content-Type: application/json" \
|
| 153 |
+
-d '{"model":"darwin-60b-duo",
|
| 154 |
+
"messages":[{"role":"user","content":"안녕하세요. 자기 소개 부탁드립니다."}]}'
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
### Option B — Manual launch (B200 / H100 × 2)
|
| 158 |
+
|
| 159 |
+
```bash
|
| 160 |
+
# 1) Darwin-28B-REASON (port 8021, GPU 0)
|
| 161 |
+
CUDA_VISIBLE_DEVICES=0 VLLM_DP_MASTER_PORT=45011 \
|
| 162 |
+
vllm serve FINAL-Bench/Darwin-28B-REASON \
|
| 163 |
+
--port 8021 --served-model-name darwin-28r \
|
| 164 |
+
--quantization fp8 --enforce-eager \
|
| 165 |
+
--limit-mm-per-prompt '{"image":0,"video":0}' &
|
| 166 |
+
|
| 167 |
+
# 2) AWAXIS-Think-31B (port 8022, GPU 1)
|
| 168 |
+
CUDA_VISIBLE_DEVICES=1 VLLM_DP_MASTER_PORT=45012 \
|
| 169 |
+
vllm serve Anserwise/AWAXIS-Think-31B \
|
| 170 |
+
--port 8022 --served-model-name awaxis-31b \
|
| 171 |
+
--quantization fp8 --enforce-eager \
|
| 172 |
+
--limit-mm-per-prompt '{"image":0,"video":0}' &
|
| 173 |
+
|
| 174 |
+
# 3) Gateway (port 8000) — from this repo
|
| 175 |
+
pip install -r gateway/requirements.txt
|
| 176 |
+
python gateway/server.py --port 8000 \
|
| 177 |
+
--darwin-url http://127.0.0.1:8021/v1 \
|
| 178 |
+
--awaxis-url http://127.0.0.1:8022/v1
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
> 💡 **단일 GPU 운영 (Single GPU)**: FP8 양자화 시 두 모델 합산 ~30 GB이므로 80 GB GPU 1대에 collocate 가능. `CUDA_VISIBLE_DEVICES=0` 으로 통일 + `--gpu-memory-utilization 0.45` 씩 분배.
|
| 182 |
+
|
| 183 |
+
---
|
| 184 |
+
|
| 185 |
+
## ⚙️ 운영 모드 상세 · Operation Modes
|
| 186 |
+
|
| 187 |
+
### 🟢 Mode 1 · Route (단일 라우팅, 70 % 케이스)
|
| 188 |
+
입력 분석 → 한 모델만 호출. **가장 빠르고 저렴**.
|
| 189 |
+
*Language + domain detection → single backend. Fastest and cheapest.*
|
| 190 |
+
|
| 191 |
+
판정 신호 / Detection signals:
|
| 192 |
+
- `korean_ratio(prompt) > 0.3` → AWAXIS
|
| 193 |
+
- 코드 키워드 (`def`, `function`, `import`, `class`) → Darwin
|
| 194 |
+
- 수학 마커 (`\boxed`, `equation`, `prove`) → Darwin
|
| 195 |
+
- 기타 / Else → 다수 언어 / domain 기준 가중치
|
| 196 |
+
|
| 197 |
+
### 🟡 Mode 2 · Split / Refine (분업 협력, 20 % 케이스)
|
| 198 |
+
한 모델이 초안 → 다른 모델이 다듬기. **두 모델의 장점 결합**.
|
| 199 |
+
|
| 200 |
+
```
|
| 201 |
+
예: "엔트로피를 한국어로 쉽게 풀어줘"
|
| 202 |
+
|
| 203 |
+
Step 1 Darwin (정확한 영어 reasoning) →
|
| 204 |
+
"Entropy quantifies the number of microstates compatible with a
|
| 205 |
+
given macrostate, representing disorder ..."
|
| 206 |
+
|
| 207 |
+
Step 2 AWAXIS (자연스러운 한국어 다듬기) →
|
| 208 |
+
"엔트로피는 쉽게 말하면 '무질서함의 정도' 입니다.
|
| 209 |
+
같은 모습으로 보이지만 사실 그 안에 ..."
|
| 210 |
+
```
|
| 211 |
+
|
| 212 |
+
### 🔴 Mode 3 · Ensemble V₁ Tournament (앙상블, 10 % 케이스 — 객관식·짧은 답)
|
| 213 |
+
두 모델이 각자 **N=8 self-consistency** → majority vote.
|
| 214 |
+
- 답 일치 시 → 그대로 반환 (강한 신호)
|
| 215 |
+
- 답 불일치 시 → 두 모델이 **서로의 답을 cross-verify** → tournament winner
|
| 216 |
+
|
| 217 |
+
```
|
| 218 |
+
질문: "A/B/C/D 중 정답은?"
|
| 219 |
+
Darwin (8 sample MAJ) → "C"
|
| 220 |
+
AWAXIS (8 sample MAJ) → "B"
|
| 221 |
+
→ 불일치 → Darwin 에게 "C vs B 중 정답?" + AWAXIS 에게 같은 질문
|
| 222 |
+
→ verdict 합의 → final answer
|
| 223 |
+
```
|
| 224 |
+
|
| 225 |
+
---
|
| 226 |
+
|
| 227 |
+
## 📦 Repository Layout
|
| 228 |
+
|
| 229 |
+
```
|
| 230 |
+
Darwin-60B-DUO/
|
| 231 |
+
├── README.md ← 본 모델카드 / this model card
|
| 232 |
+
├── config.json ← DUO config (base_models reference)
|
| 233 |
+
├── tokenizer_info.json ← base tokenizer reference 정보
|
| 234 |
+
├── gateway/
|
| 235 |
+
│ ├── server.py ← FastAPI orchestrator
|
| 236 |
+
│ ├── router.py ← 한/영, 도메인, 복잡도 판단
|
| 237 |
+
│ ├── refine.py ← Sequential refine logic
|
| 238 |
+
│ ├── ensemble.py ← V₁ cross-verification + MAJ@N
|
| 239 |
+
│ └── requirements.txt
|
| 240 |
+
├── docker/
|
| 241 |
+
│ └── docker-compose.yml ← vLLM ×2 + gateway 통합 launcher
|
| 242 |
+
├── benchmarks/
|
| 243 |
+
│ └── README.md ← 평가 자산 (TBA — coming soon)
|
| 244 |
+
└── LICENSE ← Gemma + Apache-2.0 dual notice
|
| 245 |
+
```
|
| 246 |
+
|
| 247 |
+
---
|
| 248 |
+
|
| 249 |
+
## 📊 평가 · Evaluation
|
| 250 |
+
|
| 251 |
+
### 구성 모델 공인 점수 (Verified Constituent Scores)
|
| 252 |
+
- **Darwin-28B-REASON** — Hugging Face 공인 **GPQA Diamond 벤치마크 3위**
|
| 253 |
+
- **AWAXIS-Think-31B** — 대한민국 과학기술정보통신부 운영 **국가 공인 K-AI 리더보드 1위**
|
| 254 |
+
|
| 255 |
+
### Darwin-60B-DUO Aggregate Bench
|
| 256 |
+
- **GPQA Diamond (full 198Q)** — TBA (정식 평가 진행 예정 / scheduled)
|
| 257 |
+
- **KMMLU** — TBA
|
| 258 |
+
- **CLIcK (Korean cultural)** — TBA
|
| 259 |
+
- **Helmet · Ruler (long context)** — TBA
|
| 260 |
+
|
| 261 |
+
> 정식 198Q GPQA 및 K-AI 리더보드 DUO 점수는 평가 완료 후 `benchmarks/` 디렉토리에 게재됩니다.
|
| 262 |
+
>
|
| 263 |
+
> *Full 198-question GPQA and K-AI leaderboard DUO scores will be published in `benchmarks/` after formal evaluation.*
|
| 264 |
+
|
| 265 |
+
---
|
| 266 |
+
|
| 267 |
+
## 📜 라이센스 · License
|
| 268 |
+
|
| 269 |
+
**Combined license — Gemma** (the more restrictive of the constituent base models).
|
| 270 |
+
|
| 271 |
+
| 구성 모델 | License |
|
| 272 |
+
|-----------|---------|
|
| 273 |
+
| Darwin-28B-REASON | Apache-2.0 |
|
| 274 |
+
| AWAXIS-Think-31B | Gemma (inherited from Gemma-4) |
|
| 275 |
+
| **Darwin-60B-DUO** | **Gemma** (combined-license inheritance rule) |
|
| 276 |
+
|
| 277 |
+
상업적 이용에 앞서 [Gemma Terms of Use](https://ai.google.dev/gemma/terms) 와 [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy) 를 반드시 검토하세요.
|
| 278 |
+
|
| 279 |
+
*Please review the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) before commercial deployment.*
|
| 280 |
+
|
| 281 |
+
---
|
| 282 |
+
|
| 283 |
+
## ⚠️ Limitations · 한계
|
| 284 |
+
|
| 285 |
+
- **합산 모델 weight 자체는 본 repo에 없음** — gateway 가 두 base 모델 (Darwin-28B-REASON · AWAXIS-Think-31B) 의 vLLM endpoint 를 호출. 각 base 모델 weight 는 해당 repo 에서 별도 fetch 됩니다.
|
| 286 |
+
- **2× GPU baseline** — BF16 운영 시 GPU 2대 필요. FP8 시 1대 가능 (B200/H100 80GB 기준).
|
| 287 |
+
- **추가 latency** — Split / Ensemble mode 사용 시 단일 모델 대비 ~ 2× 지연 발생.
|
| 288 |
+
- **두 모델 동시 학습 데이터 cut-off** — Darwin-28B-REASON: ~2026-Q1, AWAXIS-Think-31B: ~2026-Q1.
|
| 289 |
+
- **할루시네이션 (Hallucination)** — 일반 LLM 의 한계 그대로 적용됩니다.
|
| 290 |
+
|
| 291 |
+
*Combined weights are not bundled in this repo — the gateway calls vLLM endpoints of the two base models, each fetched from their respective repos.*
|
| 292 |
+
|
| 293 |
+
---
|
| 294 |
+
|
| 295 |
+
## 🙏 Acknowledgments
|
| 296 |
+
|
| 297 |
+
- **민식 (FINAL-Bench team lead)** — Darwin family architecture & DUO concept
|
| 298 |
+
- **Anserwise Korean specialist team** — AWAXIS-Think-31B development
|
| 299 |
+
- **VIDRAFT** — orchestration framework & Hybrid-A routing strategy
|
| 300 |
+
- **Google DeepMind** — Gemma-4 foundation
|
| 301 |
+
- **Cohere & Qwen team** — Command A+ / Qwen3.5 foundation lineage
|
| 302 |
+
|
| 303 |
+
---
|
| 304 |
+
|
| 305 |
+
## 📞 Contact
|
| 306 |
+
|
| 307 |
+
- HF org: [FINAL-Bench](https://huggingface.co/FINAL-Bench)
|
| 308 |
+
- Sister orgs: [Anserwise](https://huggingface.co/Anserwise) · [VIDraft](https://huggingface.co/VIDraft)
|
| 309 |
+
- Issues / discussions: 본 repo 의 **Community** 탭
|
| 310 |
+
|
| 311 |
+
---
|
| 312 |
+
|
| 313 |
+
<div align="center">
|
| 314 |
+
|
| 315 |
+
> ⭐ **Darwin-60B-DUO is the Darwin family's first DUO model. One model — two SOTAs.** ⭐
|
| 316 |
+
>
|
| 317 |
+
> ⭐ **Darwin-60B-DUO는 Darwin 패밀리 최초의 듀오 모델입니다. 하나의 모델, 두 개의 SOTA.** ⭐
|
| 318 |
+
|
| 319 |
+
</div>
|
benchmarks/README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Darwin-60B-DUO Benchmarks
|
| 2 |
+
|
| 3 |
+
> 📌 정식 benchmark 결과는 평가 완료 후 본 디렉토리에 게재됩니다.
|
| 4 |
+
> *Formal benchmark results will be posted here after evaluation.*
|
| 5 |
+
|
| 6 |
+
## 평가 예정 항목 · Scheduled Evaluations
|
| 7 |
+
|
| 8 |
+
| Benchmark | Scope | Constituent score (verified) | DUO aggregate |
|
| 9 |
+
|-----------|-------|-----------------------------|---------------|
|
| 10 |
+
| **GPQA Diamond (full 198Q)** | English graduate reasoning | Darwin-28B-REASON: HF #3 | TBA |
|
| 11 |
+
| **K-AI Leaderboard** | Korean | AWAXIS-Think-31B: MSIT #1 | TBA |
|
| 12 |
+
| **KMMLU** | Korean MMLU | TBA | TBA |
|
| 13 |
+
| **CLIcK** | Korean cultural | TBA | TBA |
|
| 14 |
+
| **Helmet · Ruler** | Long context retrieval | TBA | TBA |
|
| 15 |
+
| **NIAH 32K · 128K** | Needle-in-haystack | NIAH 32K: 5/5 each (sanity) | TBA |
|
| 16 |
+
|
| 17 |
+
## Hybrid-A 라우팅 분포 검증
|
| 18 |
+
프로덕션 트래픽 샘플로 라우터 분포 (50/20/15/5/10 %) 가 실제 호출에서도 유지되는지 정기 모니터링.
|
| 19 |
+
|
| 20 |
+
*Production traffic sampling regularly validates that the router distribution (50/20/15/5/10 %) holds in real workloads.*
|
| 21 |
+
|
| 22 |
+
## 평가 방식
|
| 23 |
+
- Per-backend isolation: 각 base 모델 단독 점수
|
| 24 |
+
- DUO aggregate: gateway 거친 최종 출력 점수
|
| 25 |
+
- Latency overhead: gateway 추가 지연 (route mode ≈ 0, split mode ≈ 1x, ensemble mode ≈ 1.5–2x)
|
config.json
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_model_type_friendly": "duo",
|
| 3 |
+
"_aggregate_brand": "Darwin-60B-DUO",
|
| 4 |
+
"architectures": [
|
| 5 |
+
"DarwinDuoOrchestrator"
|
| 6 |
+
],
|
| 7 |
+
"description": "Darwin family DUO — two SOTA constituents (English reasoning + Korean) served via a single OpenAI-compatible gateway. This repo contains the orchestrator gateway code; backend weights are fetched from the constituent repos at runtime.",
|
| 8 |
+
"constituents": [
|
| 9 |
+
{
|
| 10 |
+
"role": "english_reasoning_specialist",
|
| 11 |
+
"model_id": "FINAL-Bench/Darwin-28B-REASON",
|
| 12 |
+
"served_name": "darwin-28r",
|
| 13 |
+
"architecture": "qwen3_5",
|
| 14 |
+
"params_total": 26895998464,
|
| 15 |
+
"params_billion": 26.9,
|
| 16 |
+
"verified_rank": "Hugging Face GPQA Diamond #3",
|
| 17 |
+
"default_port": 8021,
|
| 18 |
+
"default_dp_master_port": 45011,
|
| 19 |
+
"quantization_recommended": "fp8",
|
| 20 |
+
"vllm_extra_args": [
|
| 21 |
+
"--enforce-eager",
|
| 22 |
+
"--limit-mm-per-prompt", "{\"image\":0,\"video\":0}"
|
| 23 |
+
]
|
| 24 |
+
},
|
| 25 |
+
{
|
| 26 |
+
"role": "korean_specialist",
|
| 27 |
+
"model_id": "Anserwise/AWAXIS-Think-31B",
|
| 28 |
+
"served_name": "awaxis-31b",
|
| 29 |
+
"architecture": "gemma4",
|
| 30 |
+
"params_total": 31273086512,
|
| 31 |
+
"params_billion": 31.27,
|
| 32 |
+
"verified_rank": "National K-AI Leaderboard (MSIT, Korea) #1",
|
| 33 |
+
"darwin_family_branch": "korean_specialist (Gemma-4 base)",
|
| 34 |
+
"default_port": 8022,
|
| 35 |
+
"default_dp_master_port": 45012,
|
| 36 |
+
"quantization_recommended": "fp8",
|
| 37 |
+
"vllm_extra_args": [
|
| 38 |
+
"--enforce-eager",
|
| 39 |
+
"--limit-mm-per-prompt", "{\"image\":0,\"video\":0}"
|
| 40 |
+
]
|
| 41 |
+
}
|
| 42 |
+
],
|
| 43 |
+
"aggregate_params_total": 58169085976,
|
| 44 |
+
"aggregate_params_billion": 58.17,
|
| 45 |
+
"active_params_router_mode_billion": 30,
|
| 46 |
+
"active_params_ensemble_mode_billion": 60,
|
| 47 |
+
"orchestration": {
|
| 48 |
+
"strategy_name": "Hybrid-A",
|
| 49 |
+
"version": "1.0",
|
| 50 |
+
"distribution": {
|
| 51 |
+
"route_korean": 0.50,
|
| 52 |
+
"route_english": 0.20,
|
| 53 |
+
"split_korean_with_reasoning": 0.15,
|
| 54 |
+
"split_english_with_korean_context": 0.05,
|
| 55 |
+
"ensemble_v1_mcq": 0.10
|
| 56 |
+
},
|
| 57 |
+
"average_cost_multiplier": 1.3,
|
| 58 |
+
"modes": ["route", "split_refine", "ensemble_v1"]
|
| 59 |
+
},
|
| 60 |
+
"gateway": {
|
| 61 |
+
"port": 8000,
|
| 62 |
+
"served_model_name": "darwin-60b-duo",
|
| 63 |
+
"openai_compatible": true,
|
| 64 |
+
"endpoints": ["/v1/models", "/v1/chat/completions", "/v1/completions"]
|
| 65 |
+
},
|
| 66 |
+
"transformers_compatible": false,
|
| 67 |
+
"_note": "This is NOT a direct transformers AutoModel.from_pretrained() target. Use the gateway (gateway/server.py) or Docker Compose (docker/docker-compose.yml). See README for full usage."
|
| 68 |
+
}
|
docker/docker-compose.yml
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version: "3.9"
|
| 2 |
+
|
| 3 |
+
# Darwin-60B-DUO — full-stack launcher
|
| 4 |
+
# Spins up:
|
| 5 |
+
# - vllm-darwin (Darwin-28B-REASON, GPU 0, port 8021 internal)
|
| 6 |
+
# - vllm-awaxis (AWAXIS-Think-31B, GPU 1, port 8022 internal)
|
| 7 |
+
# - gateway (FastAPI orchestrator, port 8000 exposed)
|
| 8 |
+
#
|
| 9 |
+
# Single-GPU collocation:
|
| 10 |
+
# Set CUDA_VISIBLE_DEVICES=0 for both vllm-* and lower
|
| 11 |
+
# --gpu-memory-utilization to 0.45 each (FP8 totals ~30GB on 80GB GPU).
|
| 12 |
+
|
| 13 |
+
services:
|
| 14 |
+
|
| 15 |
+
vllm-darwin:
|
| 16 |
+
image: vllm/vllm-openai:latest
|
| 17 |
+
container_name: darwin-60b-duo-vllm-darwin
|
| 18 |
+
runtime: nvidia
|
| 19 |
+
environment:
|
| 20 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 21 |
+
- VLLM_DP_MASTER_PORT=45011
|
| 22 |
+
- HF_HOME=/root/.cache/huggingface
|
| 23 |
+
- HF_TOKEN=${HF_TOKEN:-}
|
| 24 |
+
command: >
|
| 25 |
+
--model FINAL-Bench/Darwin-28B-REASON
|
| 26 |
+
--served-model-name darwin-28r
|
| 27 |
+
--host 0.0.0.0
|
| 28 |
+
--port 8021
|
| 29 |
+
--tensor-parallel-size 1
|
| 30 |
+
--max-model-len 16384
|
| 31 |
+
--dtype bfloat16
|
| 32 |
+
--quantization fp8
|
| 33 |
+
--trust-remote-code
|
| 34 |
+
--enforce-eager
|
| 35 |
+
--limit-mm-per-prompt {"image":0,"video":0}
|
| 36 |
+
--gpu-memory-utilization 0.85
|
| 37 |
+
volumes:
|
| 38 |
+
- hf_cache:/root/.cache/huggingface
|
| 39 |
+
ports:
|
| 40 |
+
- "8021:8021"
|
| 41 |
+
deploy:
|
| 42 |
+
resources:
|
| 43 |
+
reservations:
|
| 44 |
+
devices:
|
| 45 |
+
- driver: nvidia
|
| 46 |
+
count: 1
|
| 47 |
+
capabilities: [gpu]
|
| 48 |
+
healthcheck:
|
| 49 |
+
test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8021/v1/models"]
|
| 50 |
+
interval: 20s
|
| 51 |
+
timeout: 5s
|
| 52 |
+
retries: 60
|
| 53 |
+
|
| 54 |
+
vllm-awaxis:
|
| 55 |
+
image: vllm/vllm-openai:latest
|
| 56 |
+
container_name: darwin-60b-duo-vllm-awaxis
|
| 57 |
+
runtime: nvidia
|
| 58 |
+
environment:
|
| 59 |
+
- CUDA_VISIBLE_DEVICES=1
|
| 60 |
+
- VLLM_DP_MASTER_PORT=45012
|
| 61 |
+
- HF_HOME=/root/.cache/huggingface
|
| 62 |
+
- HF_TOKEN=${HF_TOKEN:-}
|
| 63 |
+
command: >
|
| 64 |
+
--model Anserwise/AWAXIS-Think-31B
|
| 65 |
+
--served-model-name awaxis-31b
|
| 66 |
+
--host 0.0.0.0
|
| 67 |
+
--port 8022
|
| 68 |
+
--tensor-parallel-size 1
|
| 69 |
+
--max-model-len 16384
|
| 70 |
+
--dtype bfloat16
|
| 71 |
+
--quantization fp8
|
| 72 |
+
--trust-remote-code
|
| 73 |
+
--enforce-eager
|
| 74 |
+
--limit-mm-per-prompt {"image":0,"video":0}
|
| 75 |
+
--gpu-memory-utilization 0.85
|
| 76 |
+
volumes:
|
| 77 |
+
- hf_cache:/root/.cache/huggingface
|
| 78 |
+
ports:
|
| 79 |
+
- "8022:8022"
|
| 80 |
+
deploy:
|
| 81 |
+
resources:
|
| 82 |
+
reservations:
|
| 83 |
+
devices:
|
| 84 |
+
- driver: nvidia
|
| 85 |
+
count: 1
|
| 86 |
+
capabilities: [gpu]
|
| 87 |
+
healthcheck:
|
| 88 |
+
test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8022/v1/models"]
|
| 89 |
+
interval: 20s
|
| 90 |
+
timeout: 5s
|
| 91 |
+
retries: 60
|
| 92 |
+
|
| 93 |
+
gateway:
|
| 94 |
+
image: python:3.11-slim
|
| 95 |
+
container_name: darwin-60b-duo-gateway
|
| 96 |
+
working_dir: /app
|
| 97 |
+
command: >
|
| 98 |
+
bash -c "pip install -q -r requirements.txt &&
|
| 99 |
+
python server.py --host 0.0.0.0 --port 8000
|
| 100 |
+
--darwin-url http://vllm-darwin:8021/v1
|
| 101 |
+
--awaxis-url http://vllm-awaxis:8022/v1"
|
| 102 |
+
volumes:
|
| 103 |
+
- ../gateway:/app
|
| 104 |
+
ports:
|
| 105 |
+
- "8000:8000"
|
| 106 |
+
depends_on:
|
| 107 |
+
vllm-darwin:
|
| 108 |
+
condition: service_healthy
|
| 109 |
+
vllm-awaxis:
|
| 110 |
+
condition: service_healthy
|
| 111 |
+
restart: unless-stopped
|
| 112 |
+
|
| 113 |
+
volumes:
|
| 114 |
+
hf_cache:
|
| 115 |
+
driver: local
|
gateway/ensemble.py
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# -*- coding: utf-8 -*-
|
| 2 |
+
"""
|
| 3 |
+
Darwin-60B-DUO Ensemble V_1 — MAJ@N self-consistency + cross-verification.
|
| 4 |
+
|
| 5 |
+
For MCQ / short-answer queries:
|
| 6 |
+
1) Each backend produces N samples at temperature τ (default 0.7)
|
| 7 |
+
2) Each backend's answer = its own majority vote (RSA / self-consistency)
|
| 8 |
+
3) If both majorities agree → return that answer
|
| 9 |
+
4) If they disagree → each backend verifies the pair (cross-verification)
|
| 10 |
+
and the gateway picks the tournament winner
|
| 11 |
+
5) Tiebreaker on split verdicts: majority-vote-count confidence
|
| 12 |
+
"""
|
| 13 |
+
import asyncio
|
| 14 |
+
import re
|
| 15 |
+
from collections import Counter
|
| 16 |
+
from typing import Any, Dict, List, Optional, Tuple
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
_LETTERS = "ABCD"
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def _extract_letter(text: str) -> str:
|
| 23 |
+
"""Extract A/B/C/D letter answer from a free-form response."""
|
| 24 |
+
if not text:
|
| 25 |
+
return ""
|
| 26 |
+
# Strip CoT / thinking tags
|
| 27 |
+
cleaned = re.sub(r"<\|START_THINKING\|>.*?<\|END_THINKING\|>", "", text, flags=re.S)
|
| 28 |
+
cleaned = re.sub(r"<think>.*?</think>", "", cleaned, flags=re.S)
|
| 29 |
+
for tag in ["<|END_THINKING|>", "</think>", "<|START_RESPONSE|>", "<|END_RESPONSE|>"]:
|
| 30 |
+
if tag in cleaned:
|
| 31 |
+
cleaned = cleaned.split(tag)[-1]
|
| 32 |
+
# Common answer patterns
|
| 33 |
+
patterns = [
|
| 34 |
+
r"ANSWER:\s*\(?([A-D])\)?",
|
| 35 |
+
r"\\boxed\{\s*\(?([A-D])\)?\s*\}",
|
| 36 |
+
r"final answer\s*(?:is|:)?\s*\(?([A-D])\)?",
|
| 37 |
+
r"answer\s+is\s*\(?([A-D])\)?",
|
| 38 |
+
r"\(([A-D])\)\s*$",
|
| 39 |
+
]
|
| 40 |
+
for p in patterns:
|
| 41 |
+
m = re.search(p, cleaned, re.I | re.M)
|
| 42 |
+
if m:
|
| 43 |
+
return m.group(1).upper()
|
| 44 |
+
# Fallback: last A-D token
|
| 45 |
+
candidates = re.findall(r"\b([A-D])\b", cleaned)
|
| 46 |
+
return candidates[-1].upper() if candidates else ""
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def _majority(letters: List[str]) -> Tuple[Optional[str], Dict[str, int]]:
|
| 50 |
+
valid = [l for l in letters if l in _LETTERS]
|
| 51 |
+
if not valid:
|
| 52 |
+
return None, {}
|
| 53 |
+
counter = Counter(valid)
|
| 54 |
+
top, _ = counter.most_common(1)[0]
|
| 55 |
+
return top, dict(counter)
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
_VERIFY_TEMPLATE = (
|
| 59 |
+
"You are a graduate-level expert verifier. Given the following multiple-"
|
| 60 |
+
"choice question and two candidate letter answers, decide which is more "
|
| 61 |
+
"likely correct.\n\n"
|
| 62 |
+
"QUESTION:\n{question}\n\n"
|
| 63 |
+
"CANDIDATE 1 says answer = {a1}\n"
|
| 64 |
+
"CANDIDATE 2 says answer = {a2}\n\n"
|
| 65 |
+
"Think briefly, then respond with exactly one line:\n"
|
| 66 |
+
"VERDICT: 1 (if candidate 1's letter is correct)\n"
|
| 67 |
+
"VERDICT: 2 (if candidate 2's letter is correct)"
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
def _parse_verdict(text: str) -> Optional[int]:
|
| 72 |
+
m = re.search(r"VERDICT:\s*([12])", text)
|
| 73 |
+
return int(m.group(1)) if m else None
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
def _last_user_text(messages: List[Dict[str, str]]) -> str:
|
| 77 |
+
for m in reversed(messages):
|
| 78 |
+
if m.get("role") == "user":
|
| 79 |
+
return m.get("content", "")
|
| 80 |
+
return ""
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
async def ensemble_v1(
|
| 84 |
+
darwin,
|
| 85 |
+
awaxis,
|
| 86 |
+
messages: List[Dict[str, str]],
|
| 87 |
+
temperature: float = 0.7,
|
| 88 |
+
max_tokens: int = 4096,
|
| 89 |
+
n_rsa: int = 8,
|
| 90 |
+
) -> str:
|
| 91 |
+
"""
|
| 92 |
+
Run V_1 ensemble. Returns the final answer string formatted as
|
| 93 |
+
"ANSWER: X" so downstream tooling can parse uniformly.
|
| 94 |
+
"""
|
| 95 |
+
# --- Phase 1: parallel RSA (each backend N samples) ---
|
| 96 |
+
d_task = darwin.chat(messages, temperature=temperature, max_tokens=max_tokens, n=n_rsa)
|
| 97 |
+
a_task = awaxis.chat(messages, temperature=temperature, max_tokens=max_tokens, n=n_rsa)
|
| 98 |
+
d_outs, a_outs = await asyncio.gather(d_task, a_task)
|
| 99 |
+
|
| 100 |
+
d_letters = [_extract_letter(o) for o in d_outs]
|
| 101 |
+
a_letters = [_extract_letter(o) for o in a_outs]
|
| 102 |
+
d_maj, d_votes = _majority(d_letters)
|
| 103 |
+
a_maj, a_votes = _majority(a_letters)
|
| 104 |
+
|
| 105 |
+
# --- Phase 2: agreement check ---
|
| 106 |
+
if d_maj is None and a_maj is None:
|
| 107 |
+
return "ANSWER: (no valid answer extracted)"
|
| 108 |
+
if d_maj is None:
|
| 109 |
+
return f"ANSWER: {a_maj}"
|
| 110 |
+
if a_maj is None:
|
| 111 |
+
return f"ANSWER: {d_maj}"
|
| 112 |
+
if d_maj == a_maj:
|
| 113 |
+
return f"ANSWER: {d_maj}"
|
| 114 |
+
|
| 115 |
+
# --- Phase 3: cross-verification on mismatch ---
|
| 116 |
+
question = _last_user_text(messages)
|
| 117 |
+
verify_prompt = _VERIFY_TEMPLATE.format(question=question, a1=d_maj, a2=a_maj)
|
| 118 |
+
verify_msgs = [{"role": "user", "content": verify_prompt}]
|
| 119 |
+
|
| 120 |
+
d_verify_task = darwin.chat(verify_msgs, temperature=0.0, max_tokens=2048, n=1)
|
| 121 |
+
a_verify_task = awaxis.chat(verify_msgs, temperature=0.0, max_tokens=2048, n=1)
|
| 122 |
+
d_verify_outs, a_verify_outs = await asyncio.gather(d_verify_task, a_verify_task)
|
| 123 |
+
d_verdict = _parse_verdict(d_verify_outs[0])
|
| 124 |
+
a_verdict = _parse_verdict(a_verify_outs[0])
|
| 125 |
+
|
| 126 |
+
# --- Phase 4: combine verdicts ---
|
| 127 |
+
if d_verdict == a_verdict and d_verdict is not None:
|
| 128 |
+
return f"ANSWER: {d_maj if d_verdict == 1 else a_maj}"
|
| 129 |
+
if d_verdict is None and a_verdict is None:
|
| 130 |
+
# Fall back to confidence (higher own-vote count wins)
|
| 131 |
+
d_conf = d_votes.get(d_maj, 0)
|
| 132 |
+
a_conf = a_votes.get(a_maj, 0)
|
| 133 |
+
return f"ANSWER: {d_maj if d_conf >= a_conf else a_maj}"
|
| 134 |
+
if d_verdict is None:
|
| 135 |
+
return f"ANSWER: {d_maj if a_verdict == 1 else a_maj}"
|
| 136 |
+
if a_verdict is None:
|
| 137 |
+
return f"ANSWER: {d_maj if d_verdict == 1 else a_maj}"
|
| 138 |
+
# Split — confidence tiebreaker
|
| 139 |
+
d_conf = d_votes.get(d_maj, 0)
|
| 140 |
+
a_conf = a_votes.get(a_maj, 0)
|
| 141 |
+
return f"ANSWER: {d_maj if d_conf >= a_conf else a_maj}"
|
gateway/refine.py
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# -*- coding: utf-8 -*-
|
| 2 |
+
"""
|
| 3 |
+
Darwin-60B-DUO Sequential Refine — two-model collaboration.
|
| 4 |
+
|
| 5 |
+
drafter_backend produces the initial draft, then refiner_backend polishes it.
|
| 6 |
+
The polish prompt is built dynamically based on the language combination so
|
| 7 |
+
that:
|
| 8 |
+
- Darwin (English reasoning) → AWAXIS (Korean polish) for Korean output
|
| 9 |
+
requiring rigorous English/STEM reasoning
|
| 10 |
+
- AWAXIS (Korean cultural context) → Darwin (English polish) for English
|
| 11 |
+
output requiring Korean cultural / linguistic context
|
| 12 |
+
"""
|
| 13 |
+
import re
|
| 14 |
+
from typing import Any, Dict, List
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
def _last_user_text(messages: List[Dict[str, str]]) -> str:
|
| 18 |
+
for m in reversed(messages):
|
| 19 |
+
if m.get("role") == "user":
|
| 20 |
+
return m.get("content", "")
|
| 21 |
+
return ""
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def _korean_ratio(text: str) -> float:
|
| 25 |
+
if not text:
|
| 26 |
+
return 0.0
|
| 27 |
+
return len(re.findall(r"[가-힣]", text)) / len(text)
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
async def sequential_refine(
|
| 31 |
+
drafter,
|
| 32 |
+
refiner,
|
| 33 |
+
messages: List[Dict[str, str]],
|
| 34 |
+
temperature: float = 0.5,
|
| 35 |
+
max_tokens: int = 4096,
|
| 36 |
+
) -> str:
|
| 37 |
+
"""
|
| 38 |
+
Step 1: drafter produces the initial answer using the user's messages.
|
| 39 |
+
Step 2: refiner is given the original messages + the drafter's response +
|
| 40 |
+
a polish instruction, then produces the final output.
|
| 41 |
+
|
| 42 |
+
The polish instruction is language-adaptive:
|
| 43 |
+
- If user asked in Korean (kr_ratio > 0.3) → polish to natural Korean
|
| 44 |
+
- If user asked in English → polish to clearer English
|
| 45 |
+
- Otherwise → general clarity polish
|
| 46 |
+
"""
|
| 47 |
+
user_text = _last_user_text(messages)
|
| 48 |
+
kr = _korean_ratio(user_text)
|
| 49 |
+
|
| 50 |
+
# ---- Step 1: drafter ----
|
| 51 |
+
draft_outputs = await drafter.chat(
|
| 52 |
+
messages,
|
| 53 |
+
temperature=temperature,
|
| 54 |
+
max_tokens=max_tokens,
|
| 55 |
+
)
|
| 56 |
+
draft = draft_outputs[0]
|
| 57 |
+
|
| 58 |
+
# ---- Step 2: refiner polish ----
|
| 59 |
+
if kr > 0.3:
|
| 60 |
+
polish_instruction = (
|
| 61 |
+
"위 초안을 사용자의 원래 질문 의도에 맞게 한국어로 자연스럽고 "
|
| 62 |
+
"정확하게 다듬어 최종 답변을 작성하세요. 사실관계는 보존하되, "
|
| 63 |
+
"어색한 표현·번역체·중복은 제거하고, 한국어 독자에게 매끄러운 "
|
| 64 |
+
"흐름이 되도록 재작성하세요. 새로운 정보 추가 금지 — 표현만 정련하세요."
|
| 65 |
+
)
|
| 66 |
+
elif kr < 0.05 and len(user_text) > 0:
|
| 67 |
+
polish_instruction = (
|
| 68 |
+
"Polish the draft above into a clearer, more concise, and "
|
| 69 |
+
"natural-sounding English response that fully addresses the "
|
| 70 |
+
"user's original question. Preserve all factual content; remove "
|
| 71 |
+
"redundancy, awkward phrasing, and translation artifacts. Do "
|
| 72 |
+
"not add new information — refine wording only."
|
| 73 |
+
)
|
| 74 |
+
else:
|
| 75 |
+
polish_instruction = (
|
| 76 |
+
"Refine the draft above for clarity, naturalness, and "
|
| 77 |
+
"consistency. Preserve all facts; remove redundancy. Do not "
|
| 78 |
+
"introduce new information."
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
refine_messages = list(messages) + [
|
| 82 |
+
{"role": "assistant", "content": draft},
|
| 83 |
+
{"role": "user", "content": polish_instruction},
|
| 84 |
+
]
|
| 85 |
+
refined_outputs = await refiner.chat(
|
| 86 |
+
refine_messages,
|
| 87 |
+
temperature=max(0.0, temperature - 0.2), # cooler for polish
|
| 88 |
+
max_tokens=max_tokens,
|
| 89 |
+
)
|
| 90 |
+
return refined_outputs[0]
|
gateway/requirements.txt
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi>=0.110
|
| 2 |
+
uvicorn[standard]>=0.27
|
| 3 |
+
httpx>=0.27
|
| 4 |
+
pydantic>=2.6
|
gateway/router.py
ADDED
|
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# -*- coding: utf-8 -*-
|
| 2 |
+
"""
|
| 3 |
+
Darwin-60B-DUO Router — language + domain + complexity classification.
|
| 4 |
+
|
| 5 |
+
Returns a RouteDecision indicating which Hybrid-A strategy to invoke:
|
| 6 |
+
- "route_darwin" : English-only single backend
|
| 7 |
+
- "route_awaxis" : Korean-only single backend
|
| 8 |
+
- "split_refine" : Darwin reasons → AWAXIS polishes (Korean output, English reasoning)
|
| 9 |
+
- "split_refine_reverse" : AWAXIS retrieves → Darwin polishes (English output, Korean context)
|
| 10 |
+
- "ensemble_v1" : MCQ / short answer requiring cross-verification
|
| 11 |
+
"""
|
| 12 |
+
import re
|
| 13 |
+
from dataclasses import dataclass
|
| 14 |
+
from typing import Optional
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
# ---------------------------------------------------------------------------
|
| 18 |
+
# Heuristic keyword lists
|
| 19 |
+
# ---------------------------------------------------------------------------
|
| 20 |
+
ENGLISH_REASONING_KEYWORDS = {
|
| 21 |
+
# Math
|
| 22 |
+
"prove", "theorem", "derivative", "integral", "equation", "matrix",
|
| 23 |
+
"vector", "topology", "manifold",
|
| 24 |
+
# Code
|
| 25 |
+
"def ", "function ", "import ", "class ", "return ", "lambda ",
|
| 26 |
+
"javascript", "python", "rust", "golang", "typescript", "regex",
|
| 27 |
+
# Sci-tech
|
| 28 |
+
"gradient", "tensor", "embedding", "transformer", "attention",
|
| 29 |
+
"rlhf", "rlvr", "quantization", "kernel",
|
| 30 |
+
# Markers
|
| 31 |
+
r"\\boxed", r"\\frac", r"\\sum", r"\\int", "<eqn>", "$$",
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
KOREAN_CULTURAL_KEYWORDS = {
|
| 35 |
+
"추석", "설날", "한국", "조선", "고려", "신라", "백제",
|
| 36 |
+
"k-pop", "케이팝", "한복", "김치", "한국어",
|
| 37 |
+
"공무원", "정부", "과기부", "교육부", "외교부",
|
| 38 |
+
"국회", "정책", "법안", "조례",
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
MCQ_PATTERNS = [
|
| 42 |
+
r"\(A\).*\(B\).*\(C\).*\(D\)",
|
| 43 |
+
r"^\s*A\..*\n\s*B\..*\n\s*C\.",
|
| 44 |
+
r"answer.*[A-D]",
|
| 45 |
+
r"정답.*[ABCD가나다라]",
|
| 46 |
+
r"\bANSWER:",
|
| 47 |
+
]
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
@dataclass
|
| 51 |
+
class RouteDecision:
|
| 52 |
+
strategy: str
|
| 53 |
+
reason: str
|
| 54 |
+
korean_ratio: float = 0.0
|
| 55 |
+
english_ratio: float = 0.0
|
| 56 |
+
has_reasoning_marker: bool = False
|
| 57 |
+
has_korean_cultural_marker: bool = False
|
| 58 |
+
is_mcq: bool = False
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
# ---------------------------------------------------------------------------
|
| 62 |
+
# Detection primitives
|
| 63 |
+
# ---------------------------------------------------------------------------
|
| 64 |
+
def korean_ratio(text: str) -> float:
|
| 65 |
+
"""Fraction of Hangul characters."""
|
| 66 |
+
if not text:
|
| 67 |
+
return 0.0
|
| 68 |
+
total = len(text)
|
| 69 |
+
hangul = len(re.findall(r"[가-힣]", text))
|
| 70 |
+
return hangul / total if total > 0 else 0.0
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
def english_ratio(text: str) -> float:
|
| 74 |
+
"""Fraction of ASCII alphabetic characters."""
|
| 75 |
+
if not text:
|
| 76 |
+
return 0.0
|
| 77 |
+
total = len(text)
|
| 78 |
+
alpha = len(re.findall(r"[a-zA-Z]", text))
|
| 79 |
+
return alpha / total if total > 0 else 0.0
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def has_reasoning_marker(text: str) -> bool:
|
| 83 |
+
"""English STEM / coding keywords or math markers."""
|
| 84 |
+
lower = text.lower()
|
| 85 |
+
for kw in ENGLISH_REASONING_KEYWORDS:
|
| 86 |
+
# Some keywords are regex patterns (start with backslash)
|
| 87 |
+
if kw.startswith("\\"):
|
| 88 |
+
if re.search(re.escape(kw), text):
|
| 89 |
+
return True
|
| 90 |
+
elif kw in lower:
|
| 91 |
+
return True
|
| 92 |
+
return False
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
def has_korean_cultural_marker(text: str) -> bool:
|
| 96 |
+
lower = text.lower()
|
| 97 |
+
return any(kw in lower for kw in KOREAN_CULTURAL_KEYWORDS)
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
def is_mcq(text: str) -> bool:
|
| 101 |
+
for pat in MCQ_PATTERNS:
|
| 102 |
+
if re.search(pat, text, re.IGNORECASE | re.MULTILINE):
|
| 103 |
+
return True
|
| 104 |
+
return False
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
# ---------------------------------------------------------------------------
|
| 108 |
+
# Strategy selector — Hybrid-A
|
| 109 |
+
# ---------------------------------------------------------------------------
|
| 110 |
+
def select_strategy(text: str) -> RouteDecision:
|
| 111 |
+
"""
|
| 112 |
+
Hybrid-A strategy decision:
|
| 113 |
+
1) MCQ-style short answer → ensemble_v1
|
| 114 |
+
2) Korean output + English/STEM reasoning needed → split_refine
|
| 115 |
+
3) English output + Korean cultural context needed → split_refine_reverse
|
| 116 |
+
4) Korean-dominant → route_awaxis
|
| 117 |
+
5) English-dominant → route_darwin
|
| 118 |
+
6) Mixed default → route_awaxis (Korean-first preference)
|
| 119 |
+
"""
|
| 120 |
+
kr = korean_ratio(text)
|
| 121 |
+
en = english_ratio(text)
|
| 122 |
+
reasoning = has_reasoning_marker(text)
|
| 123 |
+
cultural = has_korean_cultural_marker(text)
|
| 124 |
+
mcq = is_mcq(text)
|
| 125 |
+
|
| 126 |
+
decision = RouteDecision(
|
| 127 |
+
strategy="route_awaxis", # default
|
| 128 |
+
reason="default",
|
| 129 |
+
korean_ratio=round(kr, 3),
|
| 130 |
+
english_ratio=round(en, 3),
|
| 131 |
+
has_reasoning_marker=reasoning,
|
| 132 |
+
has_korean_cultural_marker=cultural,
|
| 133 |
+
is_mcq=mcq,
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
# 1. MCQ — always ensemble (10% case)
|
| 137 |
+
if mcq and len(text) < 4000:
|
| 138 |
+
decision.strategy = "ensemble_v1"
|
| 139 |
+
decision.reason = "mcq_short_answer"
|
| 140 |
+
return decision
|
| 141 |
+
|
| 142 |
+
# 2. Korean output + reasoning required (15% case)
|
| 143 |
+
if kr > 0.3 and reasoning:
|
| 144 |
+
decision.strategy = "split_refine"
|
| 145 |
+
decision.reason = "korean_output_with_english_reasoning"
|
| 146 |
+
return decision
|
| 147 |
+
|
| 148 |
+
# 3. English output + Korean cultural context (5% case)
|
| 149 |
+
if en > 0.5 and kr < 0.05 and cultural:
|
| 150 |
+
decision.strategy = "split_refine_reverse"
|
| 151 |
+
decision.reason = "english_output_with_korean_context"
|
| 152 |
+
return decision
|
| 153 |
+
|
| 154 |
+
# 4. Korean-dominant (50% case)
|
| 155 |
+
if kr >= 0.3:
|
| 156 |
+
decision.strategy = "route_awaxis"
|
| 157 |
+
decision.reason = "korean_dominant"
|
| 158 |
+
return decision
|
| 159 |
+
|
| 160 |
+
# 5. English-dominant (20% case)
|
| 161 |
+
if en >= 0.5 and kr < 0.05:
|
| 162 |
+
decision.strategy = "route_darwin"
|
| 163 |
+
decision.reason = "english_dominant"
|
| 164 |
+
return decision
|
| 165 |
+
|
| 166 |
+
# 6. Mixed / ambiguous → AWAXIS (Korean-first default)
|
| 167 |
+
decision.strategy = "route_awaxis"
|
| 168 |
+
decision.reason = "mixed_fallback_korean"
|
| 169 |
+
return decision
|
| 170 |
+
|
| 171 |
+
|
| 172 |
+
# ---------------------------------------------------------------------------
|
| 173 |
+
# Smoke test
|
| 174 |
+
# ---------------------------------------------------------------------------
|
| 175 |
+
if __name__ == "__main__":
|
| 176 |
+
samples = [
|
| 177 |
+
("순수 한국어 채팅", "안녕하세요. 오늘 날씨가 어떤가요?"),
|
| 178 |
+
("순수 영어 코드", "def fib(n):\n return n if n < 2 else fib(n-1) + fib(n-2)"),
|
| 179 |
+
("한국어 + 영어 reasoning", "Transformer attention의 작동 원리를 한국어로 설명해줘"),
|
| 180 |
+
("영어 + 한국 문화", "Explain the Korean Chuseok holiday in simple English."),
|
| 181 |
+
("MCQ", "Which is correct?\n(A) foo\n(B) bar\n(C) baz\n(D) qux"),
|
| 182 |
+
("한국어 MCQ", "정답은 무엇인가요? A. 1 B. 2 C. 3 D. 4"),
|
| 183 |
+
]
|
| 184 |
+
for name, txt in samples:
|
| 185 |
+
d = select_strategy(txt)
|
| 186 |
+
print(f"[{name}] -> {d.strategy} ({d.reason}) kr={d.korean_ratio} en={d.english_ratio}")
|
gateway/server.py
ADDED
|
@@ -0,0 +1,286 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# -*- coding: utf-8 -*-
|
| 2 |
+
"""
|
| 3 |
+
Darwin-60B-DUO Gateway — FastAPI OpenAI-compatible orchestrator.
|
| 4 |
+
|
| 5 |
+
Exposes a single OpenAI-compatible endpoint ("darwin-60b-duo") that
|
| 6 |
+
internally routes to two backends:
|
| 7 |
+
- Darwin-28B-REASON (English reasoning specialist, HF GPQA Diamond #3)
|
| 8 |
+
- AWAXIS-Think-31B (Korean specialist, K-AI Leaderboard #1)
|
| 9 |
+
|
| 10 |
+
Hybrid-A strategy (config.json):
|
| 11 |
+
- 70% Route (single backend)
|
| 12 |
+
- 20% Split / Refine (sequential two-model collaboration)
|
| 13 |
+
- 10% Ensemble V_1 (cross-verification tournament for MCQ / short answers)
|
| 14 |
+
|
| 15 |
+
Run:
|
| 16 |
+
pip install -r requirements.txt
|
| 17 |
+
python server.py --port 8000 \\
|
| 18 |
+
--darwin-url http://127.0.0.1:8021/v1 \\
|
| 19 |
+
--awaxis-url http://127.0.0.1:8022/v1
|
| 20 |
+
|
| 21 |
+
License: Gemma (combined-license inheritance — see README).
|
| 22 |
+
"""
|
| 23 |
+
import argparse
|
| 24 |
+
import asyncio
|
| 25 |
+
import json
|
| 26 |
+
import time
|
| 27 |
+
import uuid
|
| 28 |
+
from typing import Any, Dict, List, Optional
|
| 29 |
+
|
| 30 |
+
import httpx
|
| 31 |
+
from fastapi import FastAPI, HTTPException
|
| 32 |
+
from fastapi.responses import JSONResponse, StreamingResponse
|
| 33 |
+
from pydantic import BaseModel, Field
|
| 34 |
+
|
| 35 |
+
from router import select_strategy, RouteDecision
|
| 36 |
+
from refine import sequential_refine
|
| 37 |
+
from ensemble import ensemble_v1
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
# ---------------------------------------------------------------------------
|
| 41 |
+
# Pydantic models — OpenAI Chat Completions API subset
|
| 42 |
+
# ---------------------------------------------------------------------------
|
| 43 |
+
class ChatMessage(BaseModel):
|
| 44 |
+
role: str
|
| 45 |
+
content: str
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
class ChatCompletionRequest(BaseModel):
|
| 49 |
+
model: str = "darwin-60b-duo"
|
| 50 |
+
messages: List[ChatMessage]
|
| 51 |
+
temperature: float = 0.7
|
| 52 |
+
top_p: float = 0.95
|
| 53 |
+
max_tokens: int = 4096
|
| 54 |
+
n: int = 1
|
| 55 |
+
stream: bool = False
|
| 56 |
+
# Optional: force a specific strategy ("route_darwin", "route_awaxis",
|
| 57 |
+
# "split_refine", "ensemble_v1", "auto"). Default "auto" = Hybrid-A router.
|
| 58 |
+
duo_strategy: Optional[str] = "auto"
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
# ---------------------------------------------------------------------------
|
| 62 |
+
# Backend HTTP client
|
| 63 |
+
# ---------------------------------------------------------------------------
|
| 64 |
+
class Backend:
|
| 65 |
+
def __init__(self, name: str, base_url: str, served_name: str):
|
| 66 |
+
self.name = name
|
| 67 |
+
self.base_url = base_url.rstrip("/")
|
| 68 |
+
self.served_name = served_name
|
| 69 |
+
self.client = httpx.AsyncClient(timeout=httpx.Timeout(900.0))
|
| 70 |
+
|
| 71 |
+
async def chat(
|
| 72 |
+
self,
|
| 73 |
+
messages: List[Dict[str, str]],
|
| 74 |
+
temperature: float = 0.7,
|
| 75 |
+
max_tokens: int = 4096,
|
| 76 |
+
n: int = 1,
|
| 77 |
+
top_p: float = 0.95,
|
| 78 |
+
) -> List[str]:
|
| 79 |
+
payload = {
|
| 80 |
+
"model": self.served_name,
|
| 81 |
+
"messages": messages,
|
| 82 |
+
"temperature": temperature,
|
| 83 |
+
"top_p": top_p,
|
| 84 |
+
"max_tokens": max_tokens,
|
| 85 |
+
"n": n,
|
| 86 |
+
}
|
| 87 |
+
r = await self.client.post(
|
| 88 |
+
f"{self.base_url}/chat/completions", json=payload
|
| 89 |
+
)
|
| 90 |
+
r.raise_for_status()
|
| 91 |
+
data = r.json()
|
| 92 |
+
return [c["message"]["content"] for c in data["choices"]]
|
| 93 |
+
|
| 94 |
+
async def health(self) -> bool:
|
| 95 |
+
try:
|
| 96 |
+
r = await self.client.get(f"{self.base_url}/models", timeout=5)
|
| 97 |
+
return r.status_code == 200
|
| 98 |
+
except Exception:
|
| 99 |
+
return False
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
# ---------------------------------------------------------------------------
|
| 103 |
+
# FastAPI app
|
| 104 |
+
# ---------------------------------------------------------------------------
|
| 105 |
+
app = FastAPI(
|
| 106 |
+
title="Darwin-60B-DUO Gateway",
|
| 107 |
+
version="1.0.0",
|
| 108 |
+
description=(
|
| 109 |
+
"Single OpenAI-compatible endpoint for the Darwin-60B-DUO "
|
| 110 |
+
"(Darwin-28B-REASON + AWAXIS-Think-31B). Hybrid-A routing."
|
| 111 |
+
),
|
| 112 |
+
)
|
| 113 |
+
|
| 114 |
+
# Initialized via CLI args at startup
|
| 115 |
+
DARWIN: Optional[Backend] = None
|
| 116 |
+
AWAXIS: Optional[Backend] = None
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
@app.get("/v1/models")
|
| 120 |
+
async def list_models():
|
| 121 |
+
"""Expose only the aggregate model to external callers."""
|
| 122 |
+
return {
|
| 123 |
+
"object": "list",
|
| 124 |
+
"data": [
|
| 125 |
+
{
|
| 126 |
+
"id": "darwin-60b-duo",
|
| 127 |
+
"object": "model",
|
| 128 |
+
"owned_by": "FINAL-Bench",
|
| 129 |
+
"created": int(time.time()),
|
| 130 |
+
}
|
| 131 |
+
],
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
@app.get("/health")
|
| 136 |
+
async def health():
|
| 137 |
+
d_ok = await DARWIN.health() if DARWIN else False
|
| 138 |
+
a_ok = await AWAXIS.health() if AWAXIS else False
|
| 139 |
+
status = "ok" if (d_ok and a_ok) else "degraded"
|
| 140 |
+
return {
|
| 141 |
+
"status": status,
|
| 142 |
+
"backends": {
|
| 143 |
+
"darwin-28r": d_ok,
|
| 144 |
+
"awaxis-31b": a_ok,
|
| 145 |
+
},
|
| 146 |
+
"gateway_version": "1.0.0",
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
def _build_response(content: str, route_meta: Dict[str, Any]) -> Dict[str, Any]:
|
| 151 |
+
"""Build an OpenAI-compatible Chat Completion response with route metadata."""
|
| 152 |
+
return {
|
| 153 |
+
"id": f"chatcmpl-{uuid.uuid4().hex[:24]}",
|
| 154 |
+
"object": "chat.completion",
|
| 155 |
+
"created": int(time.time()),
|
| 156 |
+
"model": "darwin-60b-duo",
|
| 157 |
+
"choices": [
|
| 158 |
+
{
|
| 159 |
+
"index": 0,
|
| 160 |
+
"message": {
|
| 161 |
+
"role": "assistant",
|
| 162 |
+
"content": content,
|
| 163 |
+
},
|
| 164 |
+
"finish_reason": "stop",
|
| 165 |
+
}
|
| 166 |
+
],
|
| 167 |
+
"usage": {
|
| 168 |
+
"prompt_tokens": -1, # Aggregate gateway does not track tokens
|
| 169 |
+
"completion_tokens": -1,
|
| 170 |
+
"total_tokens": -1,
|
| 171 |
+
},
|
| 172 |
+
# Non-standard metadata for transparency / debugging
|
| 173 |
+
"_duo_route": route_meta,
|
| 174 |
+
}
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
@app.post("/v1/chat/completions")
|
| 178 |
+
async def chat_completions(req: ChatCompletionRequest):
|
| 179 |
+
if not req.messages:
|
| 180 |
+
raise HTTPException(400, "messages must not be empty")
|
| 181 |
+
|
| 182 |
+
user_text = req.messages[-1].content
|
| 183 |
+
messages_dict = [m.dict() for m in req.messages]
|
| 184 |
+
|
| 185 |
+
# ----- Strategy selection -----
|
| 186 |
+
if req.duo_strategy and req.duo_strategy != "auto":
|
| 187 |
+
decision = RouteDecision(strategy=req.duo_strategy, reason="user_forced")
|
| 188 |
+
else:
|
| 189 |
+
decision = select_strategy(user_text)
|
| 190 |
+
|
| 191 |
+
t0 = time.time()
|
| 192 |
+
|
| 193 |
+
# ----- Execute -----
|
| 194 |
+
try:
|
| 195 |
+
if decision.strategy == "route_darwin":
|
| 196 |
+
outputs = await DARWIN.chat(
|
| 197 |
+
messages_dict,
|
| 198 |
+
temperature=req.temperature,
|
| 199 |
+
max_tokens=req.max_tokens,
|
| 200 |
+
top_p=req.top_p,
|
| 201 |
+
)
|
| 202 |
+
content = outputs[0]
|
| 203 |
+
|
| 204 |
+
elif decision.strategy == "route_awaxis":
|
| 205 |
+
outputs = await AWAXIS.chat(
|
| 206 |
+
messages_dict,
|
| 207 |
+
temperature=req.temperature,
|
| 208 |
+
max_tokens=req.max_tokens,
|
| 209 |
+
top_p=req.top_p,
|
| 210 |
+
)
|
| 211 |
+
content = outputs[0]
|
| 212 |
+
|
| 213 |
+
elif decision.strategy == "split_refine":
|
| 214 |
+
# Darwin reasons in English → AWAXIS polishes in Korean
|
| 215 |
+
content = await sequential_refine(
|
| 216 |
+
DARWIN, AWAXIS, messages_dict,
|
| 217 |
+
temperature=req.temperature, max_tokens=req.max_tokens
|
| 218 |
+
)
|
| 219 |
+
|
| 220 |
+
elif decision.strategy == "split_refine_reverse":
|
| 221 |
+
# AWAXIS retrieves Korean context → Darwin polishes in English
|
| 222 |
+
content = await sequential_refine(
|
| 223 |
+
AWAXIS, DARWIN, messages_dict,
|
| 224 |
+
temperature=req.temperature, max_tokens=req.max_tokens
|
| 225 |
+
)
|
| 226 |
+
|
| 227 |
+
elif decision.strategy == "ensemble_v1":
|
| 228 |
+
# MCQ / short answer: MAJ@N per model + cross-verify if mismatched
|
| 229 |
+
content = await ensemble_v1(
|
| 230 |
+
DARWIN, AWAXIS, messages_dict,
|
| 231 |
+
temperature=req.temperature, max_tokens=req.max_tokens,
|
| 232 |
+
n_rsa=8,
|
| 233 |
+
)
|
| 234 |
+
|
| 235 |
+
else:
|
| 236 |
+
# Fallback: AWAXIS (default for ambiguous / mixed)
|
| 237 |
+
outputs = await AWAXIS.chat(
|
| 238 |
+
messages_dict, temperature=req.temperature,
|
| 239 |
+
max_tokens=req.max_tokens, top_p=req.top_p,
|
| 240 |
+
)
|
| 241 |
+
content = outputs[0]
|
| 242 |
+
decision.strategy = "fallback_awaxis"
|
| 243 |
+
|
| 244 |
+
except httpx.HTTPError as e:
|
| 245 |
+
raise HTTPException(503, f"backend error: {type(e).__name__}: {e}")
|
| 246 |
+
|
| 247 |
+
elapsed = time.time() - t0
|
| 248 |
+
route_meta = {
|
| 249 |
+
"strategy": decision.strategy,
|
| 250 |
+
"reason": decision.reason,
|
| 251 |
+
"elapsed_s": round(elapsed, 2),
|
| 252 |
+
"language_ratio": decision.korean_ratio,
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
return JSONResponse(_build_response(content, route_meta))
|
| 256 |
+
|
| 257 |
+
|
| 258 |
+
# ---------------------------------------------------------------------------
|
| 259 |
+
# CLI
|
| 260 |
+
# ---------------------------------------------------------------------------
|
| 261 |
+
def main():
|
| 262 |
+
p = argparse.ArgumentParser()
|
| 263 |
+
p.add_argument("--host", default="0.0.0.0")
|
| 264 |
+
p.add_argument("--port", type=int, default=8000)
|
| 265 |
+
p.add_argument(
|
| 266 |
+
"--darwin-url", default="http://127.0.0.1:8021/v1",
|
| 267 |
+
help="Darwin-28B-REASON vLLM endpoint",
|
| 268 |
+
)
|
| 269 |
+
p.add_argument(
|
| 270 |
+
"--awaxis-url", default="http://127.0.0.1:8022/v1",
|
| 271 |
+
help="AWAXIS-Think-31B vLLM endpoint",
|
| 272 |
+
)
|
| 273 |
+
p.add_argument("--darwin-served-name", default="darwin-28r")
|
| 274 |
+
p.add_argument("--awaxis-served-name", default="awaxis-31b")
|
| 275 |
+
args = p.parse_args()
|
| 276 |
+
|
| 277 |
+
global DARWIN, AWAXIS
|
| 278 |
+
DARWIN = Backend("darwin-28r", args.darwin_url, args.darwin_served_name)
|
| 279 |
+
AWAXIS = Backend("awaxis-31b", args.awaxis_url, args.awaxis_served_name)
|
| 280 |
+
|
| 281 |
+
import uvicorn
|
| 282 |
+
uvicorn.run(app, host=args.host, port=args.port, log_level="info")
|
| 283 |
+
|
| 284 |
+
|
| 285 |
+
if __name__ == "__main__":
|
| 286 |
+
main()
|
tokenizer_info.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_note": "Darwin-60B-DUO uses constituent tokenizers via gateway, not a unified one.",
|
| 3 |
+
"constituent_tokenizers": {
|
| 4 |
+
"darwin-28r": {
|
| 5 |
+
"source_model": "FINAL-Bench/Darwin-28B-REASON",
|
| 6 |
+
"tokenizer_family": "qwen3_5",
|
| 7 |
+
"vocab_size_estimate": 151936
|
| 8 |
+
},
|
| 9 |
+
"awaxis-31b": {
|
| 10 |
+
"source_model": "Anserwise/AWAXIS-Think-31B",
|
| 11 |
+
"tokenizer_family": "gemma4",
|
| 12 |
+
"vocab_size_estimate": 262144
|
| 13 |
+
}
|
| 14 |
+
},
|
| 15 |
+
"routing_decision_layer": "language detection + domain classification (gateway/router.py) performs tokenization-free routing on the raw text before backend selection",
|
| 16 |
+
"downstream_token_handling": "Each backend (vLLM serving the respective base model) handles its own tokenization. The gateway operates on text strings, not token IDs."
|
| 17 |
+
}
|