SeaWolf-AI's picture
v1: FastAPI OpenAI-compatible Darwin-35B-A3B-Opus API (INT4, Docker)
2893ee9 verified
---
title: Darwin-35B-A3B-Opus API
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: OpenAI-compatible FastAPI for Darwin-35B-A3B-Opus (INT4)
---
# Darwin-35B-A3B-Opus API
Self-hosted OpenAI-compatible FastAPI server for [FINAL-Bench/Darwin-35B-A3B-Opus](https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus).
- **35B MoE / 3B active** β€” Qwen3.5-MoE based
- **INT4 quantized** (~18 GB) β€” fits on L4/A10G/L40S
- **OpenAI-compatible** endpoints + SSE streaming
- **Bearer auth** (configurable via `API_KEYS` secret)
## Endpoints
- `GET /` β€” Landing page with examples
- `GET /health` β€” Health + load status
- `GET /v1/models` β€” List models
- `POST /v1/chat/completions` β€” Chat (OpenAI compat)
## Configuration (HF Space secrets)
| Secret | Required | Description |
|--------|----------|-------------|
| `HF_TOKEN` | optional | HF token for private/gated models |
| `API_KEYS` | optional | Comma-separated bearer keys (empty = public) |
| `QUANT_MODE` | optional | `int4` (default), `int8`, `bf16` |
| `MODEL_ID` | optional | HF model id (default: `FINAL-Bench/Darwin-35B-A3B-Opus`) |
## Hardware
Recommended:
- **L4 (24GB)** β€” INT4 βœ…
- **A10G-small (24GB)** β€” INT4 βœ…
- **L40S (48GB)** β€” INT4 βœ… or INT8 βœ…
- **A100 (80GB)** β€” any mode including BF16
## Example
```python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_KEY",
base_url="https://final-bench-darwin-35b-a3b-opus-api.hf.space/v1",
)
resp = client.chat.completions.create(
model="Darwin-35B-A3B-Opus",
messages=[{"role":"user","content":"Explain GPQA"}],
max_tokens=300,
)
print(resp.choices[0].message.content)
```