ai-model / README.md
celik-muhammed's picture
Update README.md
9d29f7d verified
metadata
pinned: false
title: scikit-plots AI Model Endpoint
emoji: 🤖
colorFrom: blue
colorTo: green
license: bsd-3-clause
short_description: ai-model
hf_oauth: true
hf_oauth_scopes:
  - inference-api
sdk: gradio
sdk_version: 6.15.2
python_version: '3.13'
app_file: app.py

scikit-plots AI Model Endpoint

ZeroGPU Space that serves scikit-plots model weights via an OpenAI-compatible REST endpoint. Called by the proxy Space (scikit-plots/ai) via its BACKEND_URL environment variable.

Primary endpoint

POST /v1/chat/completions

Request body (OpenAI Chat Completions format):

{
  "model": "scikit-plots/Qwen2.5-Coder-7B-Instruct",
  "messages": [{"role": "user", "content": "Hello"}],
  "max_tokens": 512
}

Other endpoints

Method Path Purpose
GET / Status page
GET /health Liveness probe (model loaded check)
GET /ui Gradio test UI
POST /v1/chat/completions Primary inference endpoint

⚠️ Cold start

The first request in a new ZeroGPU session triggers:

  1. GPU allocation from the shared pool (1–5 minutes)
  2. Model loading from storage to GPU VRAM (for models 14GB of RAM under 16GiB hard limit: 3–8 minutes)

Total cold start: 2–10 minutes for a model.

Set PROXY_TIMEOUT=600 in the proxy Space (scikit-plots/ai) secrets. Subsequent requests in the same active session complete in seconds.

Configuration

Set in Space → Settings → Repository secrets:

Variable Required? Default Description
MODEL_ID No scikit-plots/Qwen2.5-Coder-7B-Instruct Model weights to load. Supports scikit-plots/* mirrors.
ALLOWED_ORIGINS No https://scikit-plots-ai.hf.space Comma-separated CORS origins. Add http://localhost:7860 for local dev. Do not set to * in production.
MAX_BODY_BYTES No 10485760 Maximum request body size (bytes). Non-integer values fall back to default.

Why this works with scikit-plots/* mirrors

This ZeroGPU Space downloads raw model weights directly from HuggingFace storage (Git LFS), bypassing the HuggingFace Serverless Inference API. Mirror repos (scikit-plots/*) have weights but no registered Inference Provider — so they work here but fail with the HF Serverless API.

Wire the proxy Space

Add these to scikit-plots/ai → Settings → Repository secrets:

BACKEND_URL   = https://scikit-plots-ai-model.hf.space/v1/chat/completions
PROXY_TIMEOUT = 600

References