| --- |
| title: Darwin-35B-A3B-Opus API |
| emoji: 𧬠|
| colorFrom: indigo |
| colorTo: purple |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| license: apache-2.0 |
| short_description: OpenAI-compatible FastAPI for Darwin-35B-A3B-Opus (INT4) |
| --- |
| |
| # Darwin-35B-A3B-Opus API |
|
|
| Self-hosted OpenAI-compatible FastAPI server for [FINAL-Bench/Darwin-35B-A3B-Opus](https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus). |
|
|
| - **35B MoE / 3B active** β Qwen3.5-MoE based |
| - **INT4 quantized** (~18 GB) β fits on L4/A10G/L40S |
| - **OpenAI-compatible** endpoints + SSE streaming |
| - **Bearer auth** (configurable via `API_KEYS` secret) |
|
|
| ## Endpoints |
|
|
| - `GET /` β Landing page with examples |
| - `GET /health` β Health + load status |
| - `GET /v1/models` β List models |
| - `POST /v1/chat/completions` β Chat (OpenAI compat) |
|
|
| ## Configuration (HF Space secrets) |
|
|
| | Secret | Required | Description | |
| |--------|----------|-------------| |
| | `HF_TOKEN` | optional | HF token for private/gated models | |
| | `API_KEYS` | optional | Comma-separated bearer keys (empty = public) | |
| | `QUANT_MODE` | optional | `int4` (default), `int8`, `bf16` | |
| | `MODEL_ID` | optional | HF model id (default: `FINAL-Bench/Darwin-35B-A3B-Opus`) | |
|
|
| ## Hardware |
|
|
| Recommended: |
| - **L4 (24GB)** β INT4 β
|
| - **A10G-small (24GB)** β INT4 β
|
| - **L40S (48GB)** β INT4 β
or INT8 β
|
| - **A100 (80GB)** β any mode including BF16 |
|
|
| ## Example |
|
|
| ```python |
| from openai import OpenAI |
| client = OpenAI( |
| api_key="YOUR_KEY", |
| base_url="https://final-bench-darwin-35b-a3b-opus-api.hf.space/v1", |
| ) |
| resp = client.chat.completions.create( |
| model="Darwin-35B-A3B-Opus", |
| messages=[{"role":"user","content":"Explain GPQA"}], |
| max_tokens=300, |
| ) |
| print(resp.choices[0].message.content) |
| ``` |
|
|