---
title: Darwin-35B-A3B-Opus API
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: OpenAI-compatible FastAPI for Darwin-35B-A3B-Opus (INT4)
---

# Darwin-35B-A3B-Opus API

Self-hosted OpenAI-compatible FastAPI server for [FINAL-Bench/Darwin-35B-A3B-Opus](https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus).

- **35B MoE / 3B active** — Qwen3.5-MoE based
- **INT4 quantized** (~18 GB) — fits on L4/A10G/L40S
- **OpenAI-compatible** endpoints + SSE streaming
- **Bearer auth** (configurable via `API_KEYS` secret)

## Endpoints

- `GET /` — Landing page with examples
- `GET /health` — Health + load status
- `GET /v1/models` — List models
- `POST /v1/chat/completions` — Chat (OpenAI compat)

## Configuration (HF Space secrets)

| Secret | Required | Description |
|--------|----------|-------------|
| `HF_TOKEN` | optional | HF token for private/gated models |
| `API_KEYS` | optional | Comma-separated bearer keys (empty = public) |
| `QUANT_MODE` | optional | `int4` (default), `int8`, `bf16` |
| `MODEL_ID` | optional | HF model id (default: `FINAL-Bench/Darwin-35B-A3B-Opus`) |

## Hardware

Recommended:
- **L4 (24GB)** — INT4 ✅
- **A10G-small (24GB)** — INT4 ✅
- **L40S (48GB)** — INT4 ✅ or INT8 ✅
- **A100 (80GB)** — any mode including BF16

## Example

```python
from openai import OpenAI
client = OpenAI(
    api_key="YOUR_KEY",
    base_url="https://final-bench-darwin-35b-a3b-opus-api.hf.space/v1",
)
resp = client.chat.completions.create(
    model="Darwin-35B-A3B-Opus",
    messages=[{"role":"user","content":"Explain GPQA"}],
    max_tokens=300,
)
print(resp.choices[0].message.content)
```