| # SLLM-150M β Chat Model (SFT) |
|
|
| Supervised Fine-Tuning pipeline to turn the pretrained **SLLM-150M** base model into |
| an instruction-following chat model using **OpenHermes-2.5**. |
|
|
| ## Pipeline |
|
|
| ``` |
| Base model (runs/sllm_150m/ckpt_0011500.pt) |
| β |
| βΌ |
| prepare_data.py βββ download & tokenize OpenHermes-2.5 (80k convs) |
| β |
| βΌ |
| sft_train.py βββ SFT with ChatML loss masking |
| β |
| βΌ |
| chat.py βββ interactive CLI chat |
| ``` |
|
|
| ## Step 1 β Install dependency |
|
|
| ```bash |
| pip install datasets |
| ``` |
|
|
| ## Step 2 β Prepare data |
|
|
| Downloads 80k conversations, formats as ChatML, tokenizes, saves shards. |
| Also saves the extended tokenizer (vocab 32,002) to `finetune/data/`. |
|
|
| ```bash |
| python finetune/prepare_data.py |
| ``` |
|
|
| Options: |
|
|
| | Flag | Default | Description | |
| |------|---------|-------------| |
| | `--n_samples` | `80000` | Conversations to sample | |
| | `--val_ratio` | `0.05` | Validation fraction | |
| | `--output_dir` | `finetune/data` | Output directory | |
| | `--seed` | `42` | Random seed | |
|
|
| Expected output: |
| ``` |
| finetune/data/ |
| tokenizer.json β extended tokenizer (32,002 vocab) |
| tokenizer_config.json |
| special_tokens_map.json |
| train_sft.pt β ~76,000 examples |
| val_sft.pt β ~4,000 examples |
| meta.json β stats |
| ``` |
|
|
| ## Step 3 β Fine-tune |
|
|
| ```bash |
| python finetune/sft_train.py \ |
| --base_ckpt runs/sllm_150m/ckpt_0011500.pt \ |
| --run_dir runs/sllm_150m_chat \ |
| --max_steps 2000 \ |
| --batch_size 4 --grad_accum 8 \ |
| --grad_checkpoint |
| ``` |
|
|
| For an RTX 3050 4 GB, these settings use ~3.5 GB VRAM and take **~5β8 minutes**. |
|
|
| **Resume training:** |
| ```bash |
| python finetune/sft_train.py \ |
| --resume --run_dir runs/sllm_150m_chat \ |
| --extra_steps 1000 |
| ``` |
|
|
| Key options: |
|
|
| | Flag | Default | Description | |
| |------|---------|-------------| |
| | `--base_ckpt` | `runs/sllm_150m/ckpt_0011500.pt` | Base pretrained checkpoint | |
| | `--max_lr` | `1e-5` | Peak LR (10Γ lower than pretraining) | |
| | `--dropout` | `0.1` | SFT dropout (0 in pretraining) | |
| | `--max_steps` | `2000` | Total training steps | |
| | `--grad_checkpoint` | off | Enable for lower VRAM | |
|
|
| Checkpoints are saved to `runs/sllm_150m_chat/ckpt_sft_XXXXXXX.pt`. |
| Training log: `runs/sllm_150m_chat/sft_log.jsonl`. |
|
|
| ## Step 4 β Chat |
|
|
| ```bash |
| python finetune/chat.py |
| python finetune/chat.py --run_dir runs/sllm_150m_chat --temperature 0.7 |
| ``` |
|
|
| In-chat commands: |
|
|
| | Command | Effect | |
| |---------|--------| |
| | `/reset` | Clear conversation history | |
| | `/system <text>` | Change system prompt | |
| | `/quit` | Exit | |
|
|
| ## What changes vs pretraining |
|
|
| | | Pretraining (`train.py`) | SFT (`sft_train.py`) | |
| |---|---|---| |
| | Data | Raw text shards (`.bin`) | ChatML conversations (`.pt`) | |
| | Loss | Every token | **Assistant tokens only** (`ignore_index=-100`) | |
| | Learning rate | `3e-4` | **`1e-5`** | |
| | Warmup | 100 steps | 30 steps | |
| | Vocab | 32,000 | **32,002** (`<\|im_start\|>` + `<\|im_end\|>`) | |
| | Dropout | 0.0 | **0.1** | |
| | Checkpoint prefix | `ckpt_` | `ckpt_sft_` | |
|
|
| ## Expected loss curve |
|
|
| | Stage | Expected loss | |
| |-------|--------------| |
| | Start (step 0) | 1.5 β 2.5 | |
| | Step 500 | 1.0 β 1.5 | |
| | Step 2000 | 0.8 β 1.2 | |
|
|
| > **If loss starts above 4.0 or goes NaN** β reduce `--max_lr` to `5e-6`. |
| |
| ## Prompt format (ChatML) |
| |
| ``` |
| <|im_start|>system |
| You are a helpful, concise assistant.<|im_end|> |
| <|im_start|>user |
| What is the capital of France?<|im_end|> |
| <|im_start|>assistant |
| The capital of France is Paris.<|im_end|> |
| ``` |
| |
| Generation stops automatically when the model produces `<|im_end|>`. |
|
|