ai / README.md
celik-muhammed's picture
Update README.md
a0225bd verified
metadata
pinned: false
title: sphinx-ai-assistant proxy
emoji: πŸ”
colorFrom: blue
colorTo: green
license: bsd-3-clause
short_description: ai
hf_oauth: true
hf_oauth_scopes:
  - inference-api
sdk: docker
app_port: 7860

sphinx-ai-assistant proxy

Thin OpenAI-compatible reverse proxy for the sphinx-ai-assistant Sphinx extension. Runs as a free CPU Docker Space on HuggingFace.

What it does

  1. Accepts POST /v1/chat/completions from the browser (no auth header).
  2. Resolves the upstream backend via env vars (see Configuration below).
  3. Injects Authorization: Bearer $HF_TOKEN when required.
  4. Forwards the request body verbatim to the model backend.
  5. Returns the response (JSON or SSE stream) with CORS headers.

Files committed to this Space

File Purpose
app.py FastAPI proxy application
_shared_logic.py Pure helpers imported by app.py (no pip deps)
Dockerfile Container build β€” copies both Python files
requirements.txt FastAPI, uvicorn, httpx
README.md This file (HF Space metadata + documentation)

Important β€” _shared_logic.py must be committed alongside app.py. The Dockerfile copies it into the container. If it is absent, the container will crash on startup with ModuleNotFoundError: No module named 'shared_logic'.

Endpoints

Method Path Purpose Link
GET / Status page / HF health-check https://scikit-plots-ai.hf.space
GET /health Liveness probe https://scikit-plots-ai.hf.space/health
POST / Backward-compat alias https://scikit-plots-ai.hf.space
POST /v1/chat/completions Primary proxy endpoint https://scikit-plots-ai.hf.space/v1/chat/completions

Configuration

Set in Space β†’ Settings β†’ Repository secrets:

Variable Required? Default Description
HF_TOKEN Yes (unless BACKEND_URL set) β€” HuggingFace API token. Requires "Make calls to Inference API" permission.
BACKEND_URL No "" Custom backend URL. Bypasses HF Serverless API. Use for DMR (Path A) or ZeroGPU Space (Path C).
DEFAULT_MODEL No openai/gpt-oss-20b Fallback model when request body omits model. Must have Inference Provider if BACKEND_URL unset.
HF_BASE No https://api-inference.huggingface.co/models HF API base URL. Only used when BACKEND_URL is empty.
PROXY_TIMEOUT No 120 Upstream read timeout (seconds). Set 600 for ZeroGPU cold start. Non-integer values fall back to 120.
ALLOWED_ORIGINS No * Comma-separated CORS origins. Production: https://scikit-plots.github.io
MAX_BODY_BYTES No 10485760 Maximum request body size (bytes). Non-integer values fall back to 10485760.

Quick path reference

Path B β€” HF Serverless API (original provider models):
  HF_TOKEN      = hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
  DEFAULT_MODEL = openai/gpt-oss-20b
  PROXY_TIMEOUT = 120

Path C β€” ZeroGPU Space (any model weights, free GPU):
  BACKEND_URL   = https://scikit-plots-ai-model.hf.space/v1/chat/completions
  PROXY_TIMEOUT = 600   ← cold start for 20B model takes 2–10 minutes

Path A β€” Local Docker Model Runner (set in your shell, not Space secrets):
  export BACKEND_URL=http://localhost:12434/engines/llama.cpp/v1/chat/completions

Verify the deployment

# Liveness check
curl https://scikit-plots-ai.hf.space/health
# Expected: {"status":"ok","version":"3.1.0"}

# Status page (shows active backend)
curl https://scikit-plots-ai.hf.space/
# Expected: {"status":"ok","service":"sphinx-ai-assistant proxy v3.1.0",...}

# Test a real completion (Path B β€” replace with your Space URL)
curl https://scikit-plots-ai.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-oss-20b","messages":[{"role":"user","content":"hi"}]}'

Why mirror repos fail with HF Serverless API

scikit-plots/gpt-oss-20b and scikit-plots/Qwen2.5-Coder-32B-Instruct are mirror repositories β€” they contain weights copied from the original repos but are not registered with any HF Inference Provider.

  • POST to api-inference.huggingface.co/models/scikit-plots/gpt-oss-20b/... β†’ 404 / 503
  • POST to api-inference.huggingface.co/models/openai/gpt-oss-20b/... β†’ βœ“ works

Use DEFAULT_MODEL = openai/gpt-oss-20b for Path B. Use Path C (ZeroGPU) to run scikit-plots/* weights directly.

References