File size: 2,153 Bytes
04582f0
a2ce935
 
 
 
04582f0
d0d30ed
04582f0
 
 
 
a2ce935
 
 
b99a3f2
 
a2ce935
 
 
b99a3f2
 
 
 
 
 
 
 
 
a2ce935
 
 
b99a3f2
 
a2ce935
 
b99a3f2
a2ce935
 
 
b99a3f2
 
 
a2ce935
 
 
b99a3f2
 
 
 
 
 
a2ce935
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: intellite-100m
emoji: πŸ’¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---

# intellite-100M β€” RLHF data collector

Serves the SFT-tuned intellite 100M model in a chat UI. Every assistant reply
gets πŸ‘ / πŸ‘Ž buttons; each rating appends one JSONL record to a local folder
that a `CommitScheduler` pushes to a dataset repo on the Hub every 5 minutes.

## Setup

1. **Upload the SFT checkpoint** to the Space root as `best.pt` (or set
   `INTELLITE_CKPT=/path/to/file.pt` in Settings β†’ Variables).
2. **Create the dataset repo** `ProCreations/Intellite-storage`
   (the scheduler will auto-create it on first push too).
3. **Set `HF_TOKEN`** in Settings β†’ Secrets β€” a token with **write** scope
   on the dataset repo. Without it, the Space runs but feedback only
   persists in-memory until the container restarts.
4. (Optional) Override `FEEDBACK_REPO` in Settings β†’ Variables if you want
   to use a different dataset repo.

## Data format

Each record is a single line of JSONL in `data/data_<uuid>.jsonl` on the
dataset repo (one file per Space replica/restart):

```json
{"ts":"2026-04-20T15:23:45","system":"You are a helpful, honest, and concise assistant.","prompt_messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."},{"role":"user","content":"..."}],"response":"...","liked":true}
```

Each record is exactly `(prompt, response, reward∈{0,1})` β€” the shape any
preference/RL trainer expects. For DPO, group records by identical
`prompt_messages` and pair a `liked=true` response (chosen) with a
`liked=false` one (rejected). For REINFORCE/PPO, feed `liked` as a reward.

## Downloading the data

```bash
hf download ProCreations/Intellite-storage --repo-type=dataset --local-dir ./rlhf-data
# or in Python:
#   from huggingface_hub import snapshot_download
#   snapshot_download("ProCreations/Intellite-storage", repo_type="dataset")
```

## Notes on the free CPU tier

Generation on CPU is slow (~5–10 tok/s for 100M in fp32). If you move to the
paid GPU tier, the app auto-detects `cuda` and uses bf16 autocast β€” roughly
10Γ— faster.