Spaces:

scikit-plots
/

ai-model

Running

App Files Files Community

ai-model / README.md

celik-muhammed

Update README.md

9d29f7d verified about 20 hours ago

preview code

raw

history blame contribute delete

2.71 kB

	---
	pinned: false
	title: scikit-plots AI Model Endpoint
	emoji: 🤖
	colorFrom: blue
	colorTo: green
	license: bsd-3-clause
	short_description: ai-model
	hf_oauth: true
	hf_oauth_scopes:
	- inference-api
	sdk: gradio
	sdk_version: 6.15.2
	python_version: '3.13'
	app_file: app.py
	---

	# scikit-plots AI Model Endpoint

	ZeroGPU Space that serves scikit-plots model weights via an
	OpenAI-compatible REST endpoint. Called by the proxy Space
	(`scikit-plots/ai`) via its `BACKEND_URL` environment variable.

	## Primary endpoint

	```
	POST /v1/chat/completions
	```

	Request body (OpenAI Chat Completions format):

	```json
	{
	"model": "scikit-plots/Qwen2.5-Coder-7B-Instruct",
	"messages": [{"role": "user", "content": "Hello"}],
	"max_tokens": 512
	}
	```

	## Other endpoints

	\| Method \| Path \| Purpose \|
	\|---\|---\|---\|
	\| `GET` \| `/` \| Status page \|
	\| `GET` \| `/health` \| Liveness probe (model loaded check) \|
	\| `GET` \| `/ui` \| Gradio test UI \|
	\| `POST` \| `/v1/chat/completions` \| Primary inference endpoint \|

	## ⚠️ Cold start

	The first request in a new ZeroGPU session triggers:

	1. GPU allocation from the shared pool (1–5 minutes)
	2. Model loading from storage to GPU VRAM (for models 14GB of RAM under 16GiB hard limit: 3–8 minutes)

	Total cold start: 2–10 minutes for a model.

	Set `PROXY_TIMEOUT=600` in the proxy Space (`scikit-plots/ai`) secrets.
	Subsequent requests in the same active session complete in seconds.

	## Configuration

	Set in Space → Settings → Repository secrets:

	\| Variable \| Required? \| Default \| Description \|
	\|---\|---\|---\|---\|
	\| `MODEL_ID` \| No \| `scikit-plots/Qwen2.5-Coder-7B-Instruct` \| Model weights to load. Supports `scikit-plots/*` mirrors. \|
	\| `ALLOWED_ORIGINS` \| No \| `https://scikit-plots-ai.hf.space` \| Comma-separated CORS origins. Add `http://localhost:7860` for local dev. Do not set to `*` in production. \|
	\| `MAX_BODY_BYTES` \| No \| `10485760` \| Maximum request body size (bytes). Non-integer values fall back to default. \|

	## Why this works with scikit-plots/* mirrors

	This ZeroGPU Space downloads raw model weights directly from HuggingFace
	storage (Git LFS), bypassing the HuggingFace Serverless Inference API.
	Mirror repos (`scikit-plots/*`) have weights but no registered Inference
	Provider — so they work here but fail with the HF Serverless API.

	## Wire the proxy Space

	Add these to `scikit-plots/ai` → Settings → Repository secrets:

	```
	BACKEND_URL = https://scikit-plots-ai-model.hf.space/v1/chat/completions
	PROXY_TIMEOUT = 600
	```

	## References

	- [FREE_PROXY_SOLUTIONS.md Path C](./FREE_PROXY_SOLUTIONS.md#path-c--new-zerogpu-space-completely-free-gpu)
	- [ZeroGPU documentation](https://huggingface.co/docs/hub/spaces-zerogpu)