Spaces:
Running
Running
File size: 2,710 Bytes
317d0a2 aaa2c82 385004a aaa2c82 317d0a2 385004a aaa2c82 9d29f7d aaa2c82 317d0a2 385004a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | ---
pinned: false
title: scikit-plots AI Model Endpoint
emoji: π€
colorFrom: blue
colorTo: green
license: bsd-3-clause
short_description: ai-model
hf_oauth: true
hf_oauth_scopes:
- inference-api
sdk: gradio
sdk_version: 6.15.2
python_version: '3.13'
app_file: app.py
---
# scikit-plots AI Model Endpoint
ZeroGPU Space that serves scikit-plots model weights via an
OpenAI-compatible REST endpoint. Called by the proxy Space
(`scikit-plots/ai`) via its `BACKEND_URL` environment variable.
## Primary endpoint
```
POST /v1/chat/completions
```
Request body (OpenAI Chat Completions format):
```json
{
"model": "scikit-plots/Qwen2.5-Coder-7B-Instruct",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 512
}
```
## Other endpoints
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/` | Status page |
| `GET` | `/health` | Liveness probe (model loaded check) |
| `GET` | `/ui` | Gradio test UI |
| `POST` | `/v1/chat/completions` | Primary inference endpoint |
## β οΈ Cold start
The **first request** in a new ZeroGPU session triggers:
1. GPU allocation from the shared pool (1β5 minutes)
2. Model loading from storage to GPU VRAM (for models 14GB of RAM under 16GiB hard limit: 3β8 minutes)
**Total cold start: 2β10 minutes for a model.**
Set `PROXY_TIMEOUT=600` in the proxy Space (`scikit-plots/ai`) secrets.
Subsequent requests in the same active session complete in seconds.
## Configuration
Set in **Space β Settings β Repository secrets**:
| Variable | Required? | Default | Description |
|---|---|---|---|
| `MODEL_ID` | No | `scikit-plots/Qwen2.5-Coder-7B-Instruct` | Model weights to load. Supports `scikit-plots/*` mirrors. |
| `ALLOWED_ORIGINS` | No | `https://scikit-plots-ai.hf.space` | Comma-separated CORS origins. Add `http://localhost:7860` for local dev. Do not set to `*` in production. |
| `MAX_BODY_BYTES` | No | `10485760` | Maximum request body size (bytes). Non-integer values fall back to default. |
## Why this works with scikit-plots/* mirrors
This ZeroGPU Space downloads raw model weights directly from HuggingFace
storage (Git LFS), bypassing the HuggingFace Serverless Inference API.
Mirror repos (`scikit-plots/*`) have weights but no registered Inference
Provider β so they work here but fail with the HF Serverless API.
## Wire the proxy Space
Add these to `scikit-plots/ai` β Settings β Repository secrets:
```
BACKEND_URL = https://scikit-plots-ai-model.hf.space/v1/chat/completions
PROXY_TIMEOUT = 600
```
## References
- [FREE_PROXY_SOLUTIONS.md Path C](./FREE_PROXY_SOLUTIONS.md#path-c--new-zerogpu-space-completely-free-gpu)
- [ZeroGPU documentation](https://huggingface.co/docs/hub/spaces-zerogpu)
|