Darwin-36B-Opus — VKAE Accelerated

Ready-to-run, VKAE-accelerated serving of Darwin-36B-Opus, VIDRAFT's house 36B Mixture-of-Experts model. Model weights and an optimized serving runtime in a single self-contained container.

VKAE (VIDRAFT Kernel Acceleration Engine) is VIDRAFT's proprietary inference-serving optimization. The acceleration recipe is withheld; only the reproducible results are published here.

Measured performance

NVIDIA B200, single GPU, bf16, same-harness before/after.

Metric	Baseline	VKAE	Gain
Single-stream throughput	25.0 tok/s	280.8 tok/s	11.2×
Output quality	reference	preserved	no degradation

Quick start

docker pull vidraft/darwin36-vkae:281
docker run --gpus all -p 8000:8000 vidraft/darwin36-vkae:281

The container serves an OpenAI-compatible API on port 8000 — point any OpenAI client at http://localhost:8000/v1. A Blackwell (B200) or Hopper (H100/H200) class GPU is recommended.

About

Darwin-36B-Opus is a VIDRAFT house model (36B Mixture-of-Experts). This card documents VIDRAFT's accelerated serving of the model; the acceleration method is proprietary and not distributed in source form.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for FINAL-Bench/Darwin-36B-Opus-VKAE

Base model

FINAL-Bench/Darwin-36B-Opus

Finetuned

(2)

this model

Space using FINAL-Bench/Darwin-36B-Opus-VKAE 1

Collection including FINAL-Bench/Darwin-36B-Opus-VKAE

VKAE Accelerated

Collection

Fastest single-GPU serving of open models via VKAE. Live board: hf.co/spaces/VIDraft/vkae. Each = card + Docker. • 2 items • Updated about 12 hours ago • 11

FINAL-Bench
/

Darwin-36B-Opus-VKAE