Spaces:

scikit-plots
/

ai-model

Running

App Files Files Community

ai-model / README.md

celik-muhammed

Update README.md

9d29f7d verified about 17 hours ago

preview code

raw

history blame contribute delete

2.71 kB

metadata

pinned: false
title: scikit-plots AI Model Endpoint
emoji: 🤖
colorFrom: blue
colorTo: green
license: bsd-3-clause
short_description: ai-model
hf_oauth: true
hf_oauth_scopes:
  - inference-api
sdk: gradio
sdk_version: 6.15.2
python_version: '3.13'
app_file: app.py

scikit-plots AI Model Endpoint

ZeroGPU Space that serves scikit-plots model weights via an OpenAI-compatible REST endpoint. Called by the proxy Space (scikit-plots/ai) via its BACKEND_URL environment variable.

Primary endpoint

POST /v1/chat/completions

Request body (OpenAI Chat Completions format):

{
  "model": "scikit-plots/Qwen2.5-Coder-7B-Instruct",
  "messages": [{"role": "user", "content": "Hello"}],
  "max_tokens": 512
}

Other endpoints

Method	Path	Purpose
`GET`	`/`	Status page
`GET`	`/health`	Liveness probe (model loaded check)
`GET`	`/ui`	Gradio test UI
`POST`	`/v1/chat/completions`	Primary inference endpoint

⚠️ Cold start

The first request in a new ZeroGPU session triggers:

GPU allocation from the shared pool (1–5 minutes)
Model loading from storage to GPU VRAM (for models 14GB of RAM under 16GiB hard limit: 3–8 minutes)

Total cold start: 2–10 minutes for a model.

Set PROXY_TIMEOUT=600 in the proxy Space (scikit-plots/ai) secrets. Subsequent requests in the same active session complete in seconds.

Configuration

Set in Space → Settings → Repository secrets:

Variable	Required?	Default	Description
`MODEL_ID`	No	`scikit-plots/Qwen2.5-Coder-7B-Instruct`	Model weights to load. Supports `scikit-plots/*` mirrors.
`ALLOWED_ORIGINS`	No	`https://scikit-plots-ai.hf.space`	Comma-separated CORS origins. Add `http://localhost:7860` for local dev. Do not set to `*` in production.
`MAX_BODY_BYTES`	No	`10485760`	Maximum request body size (bytes). Non-integer values fall back to default.

Why this works with scikit-plots/* mirrors

This ZeroGPU Space downloads raw model weights directly from HuggingFace storage (Git LFS), bypassing the HuggingFace Serverless Inference API. Mirror repos (scikit-plots/*) have weights but no registered Inference Provider — so they work here but fail with the HF Serverless API.

Wire the proxy Space

Add these to scikit-plots/ai → Settings → Repository secrets:

BACKEND_URL   = https://scikit-plots-ai-model.hf.space/v1/chat/completions
PROXY_TIMEOUT = 600