Spaces:
Running
pinned: false
title: scikit-plots AI Model Endpoint
emoji: 🤖
colorFrom: blue
colorTo: green
license: bsd-3-clause
short_description: ai-model
hf_oauth: true
hf_oauth_scopes:
- inference-api
sdk: gradio
sdk_version: 6.15.2
python_version: '3.13'
app_file: app.py
scikit-plots AI Model Endpoint
ZeroGPU Space that serves scikit-plots model weights via an
OpenAI-compatible REST endpoint. Called by the proxy Space
(scikit-plots/ai) via its BACKEND_URL environment variable.
Primary endpoint
POST /v1/chat/completions
Request body (OpenAI Chat Completions format):
{
"model": "scikit-plots/Qwen2.5-Coder-7B-Instruct",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 512
}
Other endpoints
| Method | Path | Purpose |
|---|---|---|
GET |
/ |
Status page |
GET |
/health |
Liveness probe (model loaded check) |
GET |
/ui |
Gradio test UI |
POST |
/v1/chat/completions |
Primary inference endpoint |
⚠️ Cold start
The first request in a new ZeroGPU session triggers:
- GPU allocation from the shared pool (1–5 minutes)
- Model loading from storage to GPU VRAM (for models 14GB of RAM under 16GiB hard limit: 3–8 minutes)
Total cold start: 2–10 minutes for a model.
Set PROXY_TIMEOUT=600 in the proxy Space (scikit-plots/ai) secrets.
Subsequent requests in the same active session complete in seconds.
Configuration
Set in Space → Settings → Repository secrets:
| Variable | Required? | Default | Description |
|---|---|---|---|
MODEL_ID |
No | scikit-plots/Qwen2.5-Coder-7B-Instruct |
Model weights to load. Supports scikit-plots/* mirrors. |
ALLOWED_ORIGINS |
No | https://scikit-plots-ai.hf.space |
Comma-separated CORS origins. Add http://localhost:7860 for local dev. Do not set to * in production. |
MAX_BODY_BYTES |
No | 10485760 |
Maximum request body size (bytes). Non-integer values fall back to default. |
Why this works with scikit-plots/* mirrors
This ZeroGPU Space downloads raw model weights directly from HuggingFace
storage (Git LFS), bypassing the HuggingFace Serverless Inference API.
Mirror repos (scikit-plots/*) have weights but no registered Inference
Provider — so they work here but fail with the HF Serverless API.
Wire the proxy Space
Add these to scikit-plots/ai → Settings → Repository secrets:
BACKEND_URL = https://scikit-plots-ai-model.hf.space/v1/chat/completions
PROXY_TIMEOUT = 600