sllm / finetune /README.md
geeteshcodes's picture
Initial commit
7f974df verified
# SLLM-150M β†’ Chat Model (SFT)
Supervised Fine-Tuning pipeline to turn the pretrained **SLLM-150M** base model into
an instruction-following chat model using **OpenHermes-2.5**.
## Pipeline
```
Base model (runs/sllm_150m/ckpt_0011500.pt)
β”‚
β–Ό
prepare_data.py ─── download & tokenize OpenHermes-2.5 (80k convs)
β”‚
β–Ό
sft_train.py ─── SFT with ChatML loss masking
β”‚
β–Ό
chat.py ─── interactive CLI chat
```
## Step 1 β€” Install dependency
```bash
pip install datasets
```
## Step 2 β€” Prepare data
Downloads 80k conversations, formats as ChatML, tokenizes, saves shards.
Also saves the extended tokenizer (vocab 32,002) to `finetune/data/`.
```bash
python finetune/prepare_data.py
```
Options:
| Flag | Default | Description |
|------|---------|-------------|
| `--n_samples` | `80000` | Conversations to sample |
| `--val_ratio` | `0.05` | Validation fraction |
| `--output_dir` | `finetune/data` | Output directory |
| `--seed` | `42` | Random seed |
Expected output:
```
finetune/data/
tokenizer.json ← extended tokenizer (32,002 vocab)
tokenizer_config.json
special_tokens_map.json
train_sft.pt ← ~76,000 examples
val_sft.pt ← ~4,000 examples
meta.json ← stats
```
## Step 3 β€” Fine-tune
```bash
python finetune/sft_train.py \
--base_ckpt runs/sllm_150m/ckpt_0011500.pt \
--run_dir runs/sllm_150m_chat \
--max_steps 2000 \
--batch_size 4 --grad_accum 8 \
--grad_checkpoint
```
For an RTX 3050 4 GB, these settings use ~3.5 GB VRAM and take **~5–8 minutes**.
**Resume training:**
```bash
python finetune/sft_train.py \
--resume --run_dir runs/sllm_150m_chat \
--extra_steps 1000
```
Key options:
| Flag | Default | Description |
|------|---------|-------------|
| `--base_ckpt` | `runs/sllm_150m/ckpt_0011500.pt` | Base pretrained checkpoint |
| `--max_lr` | `1e-5` | Peak LR (10Γ— lower than pretraining) |
| `--dropout` | `0.1` | SFT dropout (0 in pretraining) |
| `--max_steps` | `2000` | Total training steps |
| `--grad_checkpoint` | off | Enable for lower VRAM |
Checkpoints are saved to `runs/sllm_150m_chat/ckpt_sft_XXXXXXX.pt`.
Training log: `runs/sllm_150m_chat/sft_log.jsonl`.
## Step 4 β€” Chat
```bash
python finetune/chat.py
python finetune/chat.py --run_dir runs/sllm_150m_chat --temperature 0.7
```
In-chat commands:
| Command | Effect |
|---------|--------|
| `/reset` | Clear conversation history |
| `/system <text>` | Change system prompt |
| `/quit` | Exit |
## What changes vs pretraining
| | Pretraining (`train.py`) | SFT (`sft_train.py`) |
|---|---|---|
| Data | Raw text shards (`.bin`) | ChatML conversations (`.pt`) |
| Loss | Every token | **Assistant tokens only** (`ignore_index=-100`) |
| Learning rate | `3e-4` | **`1e-5`** |
| Warmup | 100 steps | 30 steps |
| Vocab | 32,000 | **32,002** (`<\|im_start\|>` + `<\|im_end\|>`) |
| Dropout | 0.0 | **0.1** |
| Checkpoint prefix | `ckpt_` | `ckpt_sft_` |
## Expected loss curve
| Stage | Expected loss |
|-------|--------------|
| Start (step 0) | 1.5 – 2.5 |
| Step 500 | 1.0 – 1.5 |
| Step 2000 | 0.8 – 1.2 |
> **If loss starts above 4.0 or goes NaN** β†’ reduce `--max_lr` to `5e-6`.
## Prompt format (ChatML)
```
<|im_start|>system
You are a helpful, concise assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
```
Generation stops automatically when the model produces `<|im_end|>`.