SLLM-150M β Chat Model (SFT)
Supervised Fine-Tuning pipeline to turn the pretrained SLLM-150M base model into an instruction-following chat model using OpenHermes-2.5.
Pipeline
Base model (runs/sllm_150m/ckpt_0011500.pt)
β
βΌ
prepare_data.py βββ download & tokenize OpenHermes-2.5 (80k convs)
β
βΌ
sft_train.py βββ SFT with ChatML loss masking
β
βΌ
chat.py βββ interactive CLI chat
Step 1 β Install dependency
pip install datasets
Step 2 β Prepare data
Downloads 80k conversations, formats as ChatML, tokenizes, saves shards.
Also saves the extended tokenizer (vocab 32,002) to finetune/data/.
python finetune/prepare_data.py
Options:
| Flag | Default | Description |
|---|---|---|
--n_samples |
80000 |
Conversations to sample |
--val_ratio |
0.05 |
Validation fraction |
--output_dir |
finetune/data |
Output directory |
--seed |
42 |
Random seed |
Expected output:
finetune/data/
tokenizer.json β extended tokenizer (32,002 vocab)
tokenizer_config.json
special_tokens_map.json
train_sft.pt β ~76,000 examples
val_sft.pt β ~4,000 examples
meta.json β stats
Step 3 β Fine-tune
python finetune/sft_train.py \
--base_ckpt runs/sllm_150m/ckpt_0011500.pt \
--run_dir runs/sllm_150m_chat \
--max_steps 2000 \
--batch_size 4 --grad_accum 8 \
--grad_checkpoint
For an RTX 3050 4 GB, these settings use 3.5 GB VRAM and take **5β8 minutes**.
Resume training:
python finetune/sft_train.py \
--resume --run_dir runs/sllm_150m_chat \
--extra_steps 1000
Key options:
| Flag | Default | Description |
|---|---|---|
--base_ckpt |
runs/sllm_150m/ckpt_0011500.pt |
Base pretrained checkpoint |
--max_lr |
1e-5 |
Peak LR (10Γ lower than pretraining) |
--dropout |
0.1 |
SFT dropout (0 in pretraining) |
--max_steps |
2000 |
Total training steps |
--grad_checkpoint |
off | Enable for lower VRAM |
Checkpoints are saved to runs/sllm_150m_chat/ckpt_sft_XXXXXXX.pt.
Training log: runs/sllm_150m_chat/sft_log.jsonl.
Step 4 β Chat
python finetune/chat.py
python finetune/chat.py --run_dir runs/sllm_150m_chat --temperature 0.7
In-chat commands:
| Command | Effect |
|---|---|
/reset |
Clear conversation history |
/system <text> |
Change system prompt |
/quit |
Exit |
What changes vs pretraining
Pretraining (train.py) |
SFT (sft_train.py) |
|
|---|---|---|
| Data | Raw text shards (.bin) |
ChatML conversations (.pt) |
| Loss | Every token | Assistant tokens only (ignore_index=-100) |
| Learning rate | 3e-4 |
1e-5 |
| Warmup | 100 steps | 30 steps |
| Vocab | 32,000 | 32,002 (<|im_start|> + <|im_end|>) |
| Dropout | 0.0 | 0.1 |
| Checkpoint prefix | ckpt_ |
ckpt_sft_ |
Expected loss curve
| Stage | Expected loss |
|---|---|
| Start (step 0) | 1.5 β 2.5 |
| Step 500 | 1.0 β 1.5 |
| Step 2000 | 0.8 β 1.2 |
If loss starts above 4.0 or goes NaN β reduce
--max_lrto5e-6.
Prompt format (ChatML)
<|im_start|>system
You are a helpful, concise assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
Generation stops automatically when the model produces <|im_end|>.