Spaces:
Running
Running
| pinned: false | |
| title: scikit-plots AI Model Endpoint | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: green | |
| license: bsd-3-clause | |
| short_description: ai-model | |
| hf_oauth: true | |
| hf_oauth_scopes: | |
| - inference-api | |
| sdk: gradio | |
| sdk_version: 6.15.2 | |
| python_version: '3.13' | |
| app_file: app.py | |
| # scikit-plots AI Model Endpoint | |
| ZeroGPU Space that serves scikit-plots model weights via an | |
| OpenAI-compatible REST endpoint. Called by the proxy Space | |
| (`scikit-plots/ai`) via its `BACKEND_URL` environment variable. | |
| ## Primary endpoint | |
| ``` | |
| POST /v1/chat/completions | |
| ``` | |
| Request body (OpenAI Chat Completions format): | |
| ```json | |
| { | |
| "model": "scikit-plots/Qwen2.5-Coder-7B-Instruct", | |
| "messages": [{"role": "user", "content": "Hello"}], | |
| "max_tokens": 512 | |
| } | |
| ``` | |
| ## Other endpoints | |
| | Method | Path | Purpose | | |
| |---|---|---| | |
| | `GET` | `/` | Status page | | |
| | `GET` | `/health` | Liveness probe (model loaded check) | | |
| | `GET` | `/ui` | Gradio test UI | | |
| | `POST` | `/v1/chat/completions` | Primary inference endpoint | | |
| ## β οΈ Cold start | |
| The **first request** in a new ZeroGPU session triggers: | |
| 1. GPU allocation from the shared pool (1β5 minutes) | |
| 2. Model loading from storage to GPU VRAM (for models 14GB of RAM under 16GiB hard limit: 3β8 minutes) | |
| **Total cold start: 2β10 minutes for a model.** | |
| Set `PROXY_TIMEOUT=600` in the proxy Space (`scikit-plots/ai`) secrets. | |
| Subsequent requests in the same active session complete in seconds. | |
| ## Configuration | |
| Set in **Space β Settings β Repository secrets**: | |
| | Variable | Required? | Default | Description | | |
| |---|---|---|---| | |
| | `MODEL_ID` | No | `scikit-plots/Qwen2.5-Coder-7B-Instruct` | Model weights to load. Supports `scikit-plots/*` mirrors. | | |
| | `ALLOWED_ORIGINS` | No | `https://scikit-plots-ai.hf.space` | Comma-separated CORS origins. Add `http://localhost:7860` for local dev. Do not set to `*` in production. | | |
| | `MAX_BODY_BYTES` | No | `10485760` | Maximum request body size (bytes). Non-integer values fall back to default. | | |
| ## Why this works with scikit-plots/* mirrors | |
| This ZeroGPU Space downloads raw model weights directly from HuggingFace | |
| storage (Git LFS), bypassing the HuggingFace Serverless Inference API. | |
| Mirror repos (`scikit-plots/*`) have weights but no registered Inference | |
| Provider β so they work here but fail with the HF Serverless API. | |
| ## Wire the proxy Space | |
| Add these to `scikit-plots/ai` β Settings β Repository secrets: | |
| ``` | |
| BACKEND_URL = https://scikit-plots-ai-model.hf.space/v1/chat/completions | |
| PROXY_TIMEOUT = 600 | |
| ``` | |
| ## References | |
| - [FREE_PROXY_SOLUTIONS.md Path C](./FREE_PROXY_SOLUTIONS.md#path-c--new-zerogpu-space-completely-free-gpu) | |
| - [ZeroGPU documentation](https://huggingface.co/docs/hub/spaces-zerogpu) | |