Spaces:

FINAL-Bench
/

Darwin-35B-A3B-Opus-API

Paused

App Files Files Community

Darwin-35B-A3B-Opus-API / README.md

SeaWolf-AI

v1: FastAPI OpenAI-compatible Darwin-35B-A3B-Opus API (INT4, Docker)

2893ee9 verified 8 days ago

preview code

raw

history blame contribute delete

1.72 kB

	---
	title: Darwin-35B-A3B-Opus API
	emoji: 🧬
	colorFrom: indigo
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	license: apache-2.0
	short_description: OpenAI-compatible FastAPI for Darwin-35B-A3B-Opus (INT4)
	---

	# Darwin-35B-A3B-Opus API

	Self-hosted OpenAI-compatible FastAPI server for [FINAL-Bench/Darwin-35B-A3B-Opus](https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus).

	- 35B MoE / 3B active — Qwen3.5-MoE based
	- INT4 quantized (~18 GB) — fits on L4/A10G/L40S
	- OpenAI-compatible endpoints + SSE streaming
	- Bearer auth (configurable via `API_KEYS` secret)

	## Endpoints

	- `GET /` — Landing page with examples
	- `GET /health` — Health + load status
	- `GET /v1/models` — List models
	- `POST /v1/chat/completions` — Chat (OpenAI compat)

	## Configuration (HF Space secrets)

	\| Secret \| Required \| Description \|
	\|--------\|----------\|-------------\|
	\| `HF_TOKEN` \| optional \| HF token for private/gated models \|
	\| `API_KEYS` \| optional \| Comma-separated bearer keys (empty = public) \|
	\| `QUANT_MODE` \| optional \| `int4` (default), `int8`, `bf16` \|
	\| `MODEL_ID` \| optional \| HF model id (default: `FINAL-Bench/Darwin-35B-A3B-Opus`) \|

	## Hardware

	Recommended:
	- L4 (24GB) — INT4 ✅
	- A10G-small (24GB) — INT4 ✅
	- L40S (48GB) — INT4 ✅ or INT8 ✅
	- A100 (80GB) — any mode including BF16

	## Example

	```python
	from openai import OpenAI
	client = OpenAI(
	api_key="YOUR_KEY",
	base_url="https://final-bench-darwin-35b-a3b-opus-api.hf.space/v1",
	)
	resp = client.chat.completions.create(
	model="Darwin-35B-A3B-Opus",
	messages=[{"role":"user","content":"Explain GPQA"}],
	max_tokens=300,
	)
	print(resp.choices[0].message.content)
	```