ai-model / README.md
celik-muhammed's picture
Update README.md
9d29f7d verified
---
pinned: false
title: scikit-plots AI Model Endpoint
emoji: πŸ€–
colorFrom: blue
colorTo: green
license: bsd-3-clause
short_description: ai-model
hf_oauth: true
hf_oauth_scopes:
- inference-api
sdk: gradio
sdk_version: 6.15.2
python_version: '3.13'
app_file: app.py
---
# scikit-plots AI Model Endpoint
ZeroGPU Space that serves scikit-plots model weights via an
OpenAI-compatible REST endpoint. Called by the proxy Space
(`scikit-plots/ai`) via its `BACKEND_URL` environment variable.
## Primary endpoint
```
POST /v1/chat/completions
```
Request body (OpenAI Chat Completions format):
```json
{
"model": "scikit-plots/Qwen2.5-Coder-7B-Instruct",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 512
}
```
## Other endpoints
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/` | Status page |
| `GET` | `/health` | Liveness probe (model loaded check) |
| `GET` | `/ui` | Gradio test UI |
| `POST` | `/v1/chat/completions` | Primary inference endpoint |
## ⚠️ Cold start
The **first request** in a new ZeroGPU session triggers:
1. GPU allocation from the shared pool (1–5 minutes)
2. Model loading from storage to GPU VRAM (for models 14GB of RAM under 16GiB hard limit: 3–8 minutes)
**Total cold start: 2–10 minutes for a model.**
Set `PROXY_TIMEOUT=600` in the proxy Space (`scikit-plots/ai`) secrets.
Subsequent requests in the same active session complete in seconds.
## Configuration
Set in **Space β†’ Settings β†’ Repository secrets**:
| Variable | Required? | Default | Description |
|---|---|---|---|
| `MODEL_ID` | No | `scikit-plots/Qwen2.5-Coder-7B-Instruct` | Model weights to load. Supports `scikit-plots/*` mirrors. |
| `ALLOWED_ORIGINS` | No | `https://scikit-plots-ai.hf.space` | Comma-separated CORS origins. Add `http://localhost:7860` for local dev. Do not set to `*` in production. |
| `MAX_BODY_BYTES` | No | `10485760` | Maximum request body size (bytes). Non-integer values fall back to default. |
## Why this works with scikit-plots/* mirrors
This ZeroGPU Space downloads raw model weights directly from HuggingFace
storage (Git LFS), bypassing the HuggingFace Serverless Inference API.
Mirror repos (`scikit-plots/*`) have weights but no registered Inference
Provider β€” so they work here but fail with the HF Serverless API.
## Wire the proxy Space
Add these to `scikit-plots/ai` β†’ Settings β†’ Repository secrets:
```
BACKEND_URL = https://scikit-plots-ai-model.hf.space/v1/chat/completions
PROXY_TIMEOUT = 600
```
## References
- [FREE_PROXY_SOLUTIONS.md Path C](./FREE_PROXY_SOLUTIONS.md#path-c--new-zerogpu-space-completely-free-gpu)
- [ZeroGPU documentation](https://huggingface.co/docs/hub/spaces-zerogpu)