Spaces:

ProCreations
/

Intellite-500M

Running on Zero

App Files Files Community

Intellite-500M / README.md

ProCreations

Wire up ZeroGPU: import spaces, @spaces.GPU(duration=60) on chat, cuda module-level

d1036c6 verified about 1 month ago

preview code

raw

history blame contribute delete

3 kB

	---
	title: intellite-500m-sft
	emoji: 💬
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.34.2
	app_file: app.py
	pinned: false
	---

	# intellite-500M SFT — RLHF data collector

	Serves the SFT-tuned intellite 500M model in a chat UI. Every assistant
	reply gets 👍 / 👎 buttons; each rating appends one JSONL record to a local
	folder that a `CommitScheduler` pushes to a dataset repo on the Hub every
	5 minutes.

	Weights are loaded from a bundled bf16 checkpoint (`best.pt`, ~1 GB).

	Best sampling defaults are baked into the sliders:
	temp 0.7 · top-k 40 · top-p 0.7 · rep penalty 1.1 — found by grid sweep
	against this checkpoint. You can override per-message via the right-side panel.

	## Setup

	1. Upload the SFT checkpoint to the Space root as `best.pt` (or set
	`INTELLITE_CKPT=/path/to/file.pt` in Settings → Variables).
	2. Create the dataset repo `ProCreations/Intellite-storage`
	(the scheduler will auto-create it on first push too).
	3. Set `HF_TOKEN` in Settings → Secrets — a token with write scope
	on the dataset repo. Without it, the Space runs but feedback only
	persists in-memory until the container restarts.
	4. (Optional) Override `FEEDBACK_REPO` in Settings → Variables if you want
	to use a different dataset repo.

	## Data format

	Each record is a single line of JSONL in `data/data_<uuid>.jsonl` on the
	dataset repo (one file per Space replica/restart):

	```json
	{"ts":"2026-04-25T15:23:45","system":"You are a helpful, honest, and concise assistant.","prompt_messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."},{"role":"user","content":"..."}],"response":"...","liked":true}
	```

	Each record is exactly `(prompt, response, reward∈{0,1})` — the shape any
	preference/RL trainer expects. For DPO, group records by identical
	`prompt_messages` and pair a `liked=true` response (chosen) with a
	`liked=false` one (rejected). For REINFORCE/PPO, feed `liked` as a reward.

	## Downloading the data

	```bash
	hf download ProCreations/Intellite-storage --repo-type=dataset --local-dir ./rlhf-data
	```

	## Hardware: ZeroGPU (half-H200, dynamic)

	This Space runs on HuggingFace ZeroGPU — a half-H200 slice (70 GB VRAM)
	is allocated on demand each time you press Send, then released when the
	reply finishes. Per-message latency:

	- Cold start (first message after idle): ~3–5 s of GPU queueing + ~2 s model warm
	- Warm: ~5–10 s for a typical 200–400 token reply (≈80 tok/s on H200)
	- Max-length 800-token reply: ~10–15 s

	The `chat` function is decorated with `@spaces.GPU(duration=60)` so the
	GPU stays allocated for the duration of the streamed reply, then releases.

	ZeroGPU has a per-account daily quota (3.5 min free / 25 min PRO);
	heavy users will hit a queue. Generation is otherwise free.

	If the Space stalls on cold container boot, give it ~30 s — that's the
	1 GB bf16 weights downloading from `ProCreations/intellite-500m-sft`.
	Subsequent restarts hit the cached copy.