Spaces:
Running on Zero
Running on Zero
Wire up ZeroGPU: import spaces, @spaces.GPU(duration=60) on chat, cuda module-level
d1036c6 verified | title: intellite-500m-sft | |
| emoji: π¬ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.34.2 | |
| app_file: app.py | |
| pinned: false | |
| # intellite-500M SFT β RLHF data collector | |
| Serves the SFT-tuned **intellite 500M** model in a chat UI. Every assistant | |
| reply gets π / π buttons; each rating appends one JSONL record to a local | |
| folder that a `CommitScheduler` pushes to a dataset repo on the Hub every | |
| 5 minutes. | |
| Weights are loaded from a bundled bf16 checkpoint (`best.pt`, ~1 GB). | |
| Best sampling defaults are baked into the sliders: | |
| **temp 0.7 Β· top-k 40 Β· top-p 0.7 Β· rep penalty 1.1** β found by grid sweep | |
| against this checkpoint. You can override per-message via the right-side panel. | |
| ## Setup | |
| 1. **Upload the SFT checkpoint** to the Space root as `best.pt` (or set | |
| `INTELLITE_CKPT=/path/to/file.pt` in Settings β Variables). | |
| 2. **Create the dataset repo** `ProCreations/Intellite-storage` | |
| (the scheduler will auto-create it on first push too). | |
| 3. **Set `HF_TOKEN`** in Settings β Secrets β a token with **write** scope | |
| on the dataset repo. Without it, the Space runs but feedback only | |
| persists in-memory until the container restarts. | |
| 4. (Optional) Override `FEEDBACK_REPO` in Settings β Variables if you want | |
| to use a different dataset repo. | |
| ## Data format | |
| Each record is a single line of JSONL in `data/data_<uuid>.jsonl` on the | |
| dataset repo (one file per Space replica/restart): | |
| ```json | |
| {"ts":"2026-04-25T15:23:45","system":"You are a helpful, honest, and concise assistant.","prompt_messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."},{"role":"user","content":"..."}],"response":"...","liked":true} | |
| ``` | |
| Each record is exactly `(prompt, response, rewardβ{0,1})` β the shape any | |
| preference/RL trainer expects. For DPO, group records by identical | |
| `prompt_messages` and pair a `liked=true` response (chosen) with a | |
| `liked=false` one (rejected). For REINFORCE/PPO, feed `liked` as a reward. | |
| ## Downloading the data | |
| ```bash | |
| hf download ProCreations/Intellite-storage --repo-type=dataset --local-dir ./rlhf-data | |
| ``` | |
| ## Hardware: ZeroGPU (half-H200, dynamic) | |
| This Space runs on **HuggingFace ZeroGPU** β a half-H200 slice (70 GB VRAM) | |
| is allocated on demand each time you press Send, then released when the | |
| reply finishes. Per-message latency: | |
| - Cold start (first message after idle): ~3β5 s of GPU queueing + ~2 s model warm | |
| - Warm: ~5β10 s for a typical 200β400 token reply (β80 tok/s on H200) | |
| - Max-length 800-token reply: ~10β15 s | |
| The `chat` function is decorated with `@spaces.GPU(duration=60)` so the | |
| GPU stays allocated for the duration of the streamed reply, then releases. | |
| ZeroGPU has a **per-account daily quota** (3.5 min free / 25 min PRO); | |
| heavy users will hit a queue. Generation is otherwise free. | |
| If the Space stalls on cold container boot, give it ~30 s β that's the | |
| 1 GB bf16 weights downloading from `ProCreations/intellite-500m-sft`. | |
| Subsequent restarts hit the cached copy. | |