SeaWolf-AI's picture
v1: FastAPI OpenAI-compatible Darwin-35B-A3B-Opus API (INT4, Docker)
2893ee9 verified
metadata
title: Darwin-35B-A3B-Opus API
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: OpenAI-compatible FastAPI for Darwin-35B-A3B-Opus (INT4)

Darwin-35B-A3B-Opus API

Self-hosted OpenAI-compatible FastAPI server for FINAL-Bench/Darwin-35B-A3B-Opus.

  • 35B MoE / 3B active β€” Qwen3.5-MoE based
  • INT4 quantized (~18 GB) β€” fits on L4/A10G/L40S
  • OpenAI-compatible endpoints + SSE streaming
  • Bearer auth (configurable via API_KEYS secret)

Endpoints

  • GET / β€” Landing page with examples
  • GET /health β€” Health + load status
  • GET /v1/models β€” List models
  • POST /v1/chat/completions β€” Chat (OpenAI compat)

Configuration (HF Space secrets)

Secret Required Description
HF_TOKEN optional HF token for private/gated models
API_KEYS optional Comma-separated bearer keys (empty = public)
QUANT_MODE optional int4 (default), int8, bf16
MODEL_ID optional HF model id (default: FINAL-Bench/Darwin-35B-A3B-Opus)

Hardware

Recommended:

  • L4 (24GB) β€” INT4 βœ…
  • A10G-small (24GB) β€” INT4 βœ…
  • L40S (48GB) β€” INT4 βœ… or INT8 βœ…
  • A100 (80GB) β€” any mode including BF16

Example

from openai import OpenAI
client = OpenAI(
    api_key="YOUR_KEY",
    base_url="https://final-bench-darwin-35b-a3b-opus-api.hf.space/v1",
)
resp = client.chat.completions.create(
    model="Darwin-35B-A3B-Opus",
    messages=[{"role":"user","content":"Explain GPQA"}],
    max_tokens=300,
)
print(resp.choices[0].message.content)