--- title: Darwin-35B-A3B-Opus API emoji: 🧬 colorFrom: indigo colorTo: purple sdk: docker app_port: 7860 pinned: false license: apache-2.0 short_description: OpenAI-compatible FastAPI for Darwin-35B-A3B-Opus (INT4) --- # Darwin-35B-A3B-Opus API Self-hosted OpenAI-compatible FastAPI server for [FINAL-Bench/Darwin-35B-A3B-Opus](https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus). - **35B MoE / 3B active** — Qwen3.5-MoE based - **INT4 quantized** (~18 GB) — fits on L4/A10G/L40S - **OpenAI-compatible** endpoints + SSE streaming - **Bearer auth** (configurable via `API_KEYS` secret) ## Endpoints - `GET /` — Landing page with examples - `GET /health` — Health + load status - `GET /v1/models` — List models - `POST /v1/chat/completions` — Chat (OpenAI compat) ## Configuration (HF Space secrets) | Secret | Required | Description | |--------|----------|-------------| | `HF_TOKEN` | optional | HF token for private/gated models | | `API_KEYS` | optional | Comma-separated bearer keys (empty = public) | | `QUANT_MODE` | optional | `int4` (default), `int8`, `bf16` | | `MODEL_ID` | optional | HF model id (default: `FINAL-Bench/Darwin-35B-A3B-Opus`) | ## Hardware Recommended: - **L4 (24GB)** — INT4 ✅ - **A10G-small (24GB)** — INT4 ✅ - **L40S (48GB)** — INT4 ✅ or INT8 ✅ - **A100 (80GB)** — any mode including BF16 ## Example ```python from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://final-bench-darwin-35b-a3b-opus-api.hf.space/v1", ) resp = client.chat.completions.create( model="Darwin-35B-A3B-Opus", messages=[{"role":"user","content":"Explain GPQA"}], max_tokens=300, ) print(resp.choices[0].message.content) ```