Intellite-500M / README.md
ProCreations's picture
Wire up ZeroGPU: import spaces, @spaces.GPU(duration=60) on chat, cuda module-level
d1036c6 verified
---
title: intellite-500m-sft
emoji: πŸ’¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---
# intellite-500M SFT β€” RLHF data collector
Serves the SFT-tuned **intellite 500M** model in a chat UI. Every assistant
reply gets πŸ‘ / πŸ‘Ž buttons; each rating appends one JSONL record to a local
folder that a `CommitScheduler` pushes to a dataset repo on the Hub every
5 minutes.
Weights are loaded from a bundled bf16 checkpoint (`best.pt`, ~1 GB).
Best sampling defaults are baked into the sliders:
**temp 0.7 Β· top-k 40 Β· top-p 0.7 Β· rep penalty 1.1** β€” found by grid sweep
against this checkpoint. You can override per-message via the right-side panel.
## Setup
1. **Upload the SFT checkpoint** to the Space root as `best.pt` (or set
`INTELLITE_CKPT=/path/to/file.pt` in Settings β†’ Variables).
2. **Create the dataset repo** `ProCreations/Intellite-storage`
(the scheduler will auto-create it on first push too).
3. **Set `HF_TOKEN`** in Settings β†’ Secrets β€” a token with **write** scope
on the dataset repo. Without it, the Space runs but feedback only
persists in-memory until the container restarts.
4. (Optional) Override `FEEDBACK_REPO` in Settings β†’ Variables if you want
to use a different dataset repo.
## Data format
Each record is a single line of JSONL in `data/data_<uuid>.jsonl` on the
dataset repo (one file per Space replica/restart):
```json
{"ts":"2026-04-25T15:23:45","system":"You are a helpful, honest, and concise assistant.","prompt_messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."},{"role":"user","content":"..."}],"response":"...","liked":true}
```
Each record is exactly `(prompt, response, reward∈{0,1})` β€” the shape any
preference/RL trainer expects. For DPO, group records by identical
`prompt_messages` and pair a `liked=true` response (chosen) with a
`liked=false` one (rejected). For REINFORCE/PPO, feed `liked` as a reward.
## Downloading the data
```bash
hf download ProCreations/Intellite-storage --repo-type=dataset --local-dir ./rlhf-data
```
## Hardware: ZeroGPU (half-H200, dynamic)
This Space runs on **HuggingFace ZeroGPU** β€” a half-H200 slice (70 GB VRAM)
is allocated on demand each time you press Send, then released when the
reply finishes. Per-message latency:
- Cold start (first message after idle): ~3–5 s of GPU queueing + ~2 s model warm
- Warm: ~5–10 s for a typical 200–400 token reply (β‰ˆ80 tok/s on H200)
- Max-length 800-token reply: ~10–15 s
The `chat` function is decorated with `@spaces.GPU(duration=60)` so the
GPU stays allocated for the duration of the streamed reply, then releases.
ZeroGPU has a **per-account daily quota** (3.5 min free / 25 min PRO);
heavy users will hit a queue. Generation is otherwise free.
If the Space stalls on cold container boot, give it ~30 s β€” that's the
1 GB bf16 weights downloading from `ProCreations/intellite-500m-sft`.
Subsequent restarts hit the cached copy.