Spaces:

scikit-plots
/

ai

Running

App Files Files Community

ai / README.md

celik-muhammed

Update README.md

a0225bd verified about 22 hours ago

preview code

raw

history blame contribute delete

4.79 kB

metadata

pinned: false
title: sphinx-ai-assistant proxy
emoji: 🔁
colorFrom: blue
colorTo: green
license: bsd-3-clause
short_description: ai
hf_oauth: true
hf_oauth_scopes:
  - inference-api
sdk: docker
app_port: 7860

sphinx-ai-assistant proxy

Thin OpenAI-compatible reverse proxy for the sphinx-ai-assistant Sphinx extension. Runs as a free CPU Docker Space on HuggingFace.

What it does

Accepts POST /v1/chat/completions from the browser (no auth header).
Resolves the upstream backend via env vars (see Configuration below).
Injects Authorization: Bearer $HF_TOKEN when required.
Forwards the request body verbatim to the model backend.
Returns the response (JSON or SSE stream) with CORS headers.

Files committed to this Space

File	Purpose
`app.py`	FastAPI proxy application
`_shared_logic.py`	Pure helpers imported by `app.py` (no pip deps)
`Dockerfile`	Container build — copies both Python files
`requirements.txt`	FastAPI, uvicorn, httpx
`README.md`	This file (HF Space metadata + documentation)

Important — _shared_logic.py must be committed alongside app.py. The Dockerfile copies it into the container. If it is absent, the container will crash on startup with ModuleNotFoundError: No module named 'shared_logic'.

Endpoints

Method	Path	Purpose	Link
`GET`	`/`	Status page / HF health-check	https://scikit-plots-ai.hf.space
`GET`	`/health`	Liveness probe	https://scikit-plots-ai.hf.space/health
`POST`	`/`	Backward-compat alias	https://scikit-plots-ai.hf.space
`POST`	`/v1/chat/completions`	Primary proxy endpoint	https://scikit-plots-ai.hf.space/v1/chat/completions

Configuration

Set in Space → Settings → Repository secrets:

Variable	Required?	Default	Description
`HF_TOKEN`	Yes (unless `BACKEND_URL` set)	—	HuggingFace API token. Requires "Make calls to Inference API" permission.
`BACKEND_URL`	No	`""`	Custom backend URL. Bypasses HF Serverless API. Use for DMR (Path A) or ZeroGPU Space (Path C).
`DEFAULT_MODEL`	No	`openai/gpt-oss-20b`	Fallback model when request body omits `model`. Must have Inference Provider if `BACKEND_URL` unset.
`HF_BASE`	No	`https://api-inference.huggingface.co/models`	HF API base URL. Only used when `BACKEND_URL` is empty.
`PROXY_TIMEOUT`	No	`120`	Upstream read timeout (seconds). Set `600` for ZeroGPU cold start. Non-integer values fall back to `120`.
`ALLOWED_ORIGINS`	No	`*`	Comma-separated CORS origins. Production: `https://scikit-plots.github.io`
`MAX_BODY_BYTES`	No	`10485760`	Maximum request body size (bytes). Non-integer values fall back to `10485760`.

Quick path reference

Path B — HF Serverless API (original provider models):
  HF_TOKEN      = hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
  DEFAULT_MODEL = openai/gpt-oss-20b
  PROXY_TIMEOUT = 120

Path C — ZeroGPU Space (any model weights, free GPU):
  BACKEND_URL   = https://scikit-plots-ai-model.hf.space/v1/chat/completions
  PROXY_TIMEOUT = 600   ← cold start for 20B model takes 2–10 minutes

Path A — Local Docker Model Runner (set in your shell, not Space secrets):
  export BACKEND_URL=http://localhost:12434/engines/llama.cpp/v1/chat/completions

Verify the deployment

# Liveness check
curl https://scikit-plots-ai.hf.space/health
# Expected: {"status":"ok","version":"3.1.0"}

# Status page (shows active backend)
curl https://scikit-plots-ai.hf.space/
# Expected: {"status":"ok","service":"sphinx-ai-assistant proxy v3.1.0",...}

# Test a real completion (Path B — replace with your Space URL)
curl https://scikit-plots-ai.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-oss-20b","messages":[{"role":"user","content":"hi"}]}'

Why mirror repos fail with HF Serverless API

scikit-plots/gpt-oss-20b and scikit-plots/Qwen2.5-Coder-32B-Instruct are mirror repositories — they contain weights copied from the original repos but are not registered with any HF Inference Provider.

POST to api-inference.huggingface.co/models/scikit-plots/gpt-oss-20b/... → 404 / 503
POST to api-inference.huggingface.co/models/openai/gpt-oss-20b/... → ✓ works

Use DEFAULT_MODEL = openai/gpt-oss-20b for Path B. Use Path C (ZeroGPU) to run scikit-plots/* weights directly.

References

FREE_PROXY_SOLUTIONS.md — Full path decision tree
HuggingFace Inference API
ZeroGPU documentation