metadata
title: Darwin-35B-A3B-Opus API
emoji: π§¬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: OpenAI-compatible FastAPI for Darwin-35B-A3B-Opus (INT4)
Darwin-35B-A3B-Opus API
Self-hosted OpenAI-compatible FastAPI server for FINAL-Bench/Darwin-35B-A3B-Opus.
- 35B MoE / 3B active β Qwen3.5-MoE based
- INT4 quantized (~18 GB) β fits on L4/A10G/L40S
- OpenAI-compatible endpoints + SSE streaming
- Bearer auth (configurable via
API_KEYSsecret)
Endpoints
GET /β Landing page with examplesGET /healthβ Health + load statusGET /v1/modelsβ List modelsPOST /v1/chat/completionsβ Chat (OpenAI compat)
Configuration (HF Space secrets)
| Secret | Required | Description |
|---|---|---|
HF_TOKEN |
optional | HF token for private/gated models |
API_KEYS |
optional | Comma-separated bearer keys (empty = public) |
QUANT_MODE |
optional | int4 (default), int8, bf16 |
MODEL_ID |
optional | HF model id (default: FINAL-Bench/Darwin-35B-A3B-Opus) |
Hardware
Recommended:
- L4 (24GB) β INT4 β
- A10G-small (24GB) β INT4 β
- L40S (48GB) β INT4 β or INT8 β
- A100 (80GB) β any mode including BF16
Example
from openai import OpenAI
client = OpenAI(
api_key="YOUR_KEY",
base_url="https://final-bench-darwin-35b-a3b-opus-api.hf.space/v1",
)
resp = client.chat.completions.create(
model="Darwin-35B-A3B-Opus",
messages=[{"role":"user","content":"Explain GPQA"}],
max_tokens=300,
)
print(resp.choices[0].message.content)