1.25 TB
140 files
Updated 27 days ago
Ctrl+K
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| README.md | 920 Bytes xet | 384e4209 |
Darwin-60B-DUO Benchmarks
Updated 2026-05-28 KST
🏆 GPQA Diamond (full 198 questions) — FINAL: 88.38%
| Stage | Score |
|---|---|
| Darwin single (MAJ@5) | 69.70% |
| AWAXIS single (MAJ@5) | 77.27% |
| Cascade (+ MTI) — first cascade | 83.84% |
| 🏆 Darwin-DELPHI DUO FINAL | 88.38% |
The final score is produced by the Darwin-DELPHI test-time engine operating over the two-specialist DUO. The engine is proprietary.
📌 Constituent verified scores
| Constituent | Benchmark | Rank |
|---|---|---|
| Darwin-28B-REASON | Hugging Face GPQA Diamond | #3 |
| AWAXIS-Think-31B | National K-AI Leaderboard (Republic of Korea, MSIT) | #1 |
Future evaluations (scheduled)
| Benchmark | Status |
|---|---|
| KMMLU | TBA |
| CLIcK | TBA |
| Helmet / Ruler (long context) | TBA |
| Needle-in-Haystack 128 K | NIAH 32 K verified per backend |
- Total size
- 1.25 TB
- Files
- 140
- Last updated
- Jun 6
- Pre-warmed CDN
- US EU US EU