# SLLM-150M → Chat Model (SFT) Supervised Fine-Tuning pipeline to turn the pretrained **SLLM-150M** base model into an instruction-following chat model using **OpenHermes-2.5**. ## Pipeline ``` Base model (runs/sllm_150m/ckpt_0011500.pt) │ ▼ prepare_data.py ─── download & tokenize OpenHermes-2.5 (80k convs) │ ▼ sft_train.py ─── SFT with ChatML loss masking │ ▼ chat.py ─── interactive CLI chat ``` ## Step 1 — Install dependency ```bash pip install datasets ``` ## Step 2 — Prepare data Downloads 80k conversations, formats as ChatML, tokenizes, saves shards. Also saves the extended tokenizer (vocab 32,002) to `finetune/data/`. ```bash python finetune/prepare_data.py ``` Options: | Flag | Default | Description | |------|---------|-------------| | `--n_samples` | `80000` | Conversations to sample | | `--val_ratio` | `0.05` | Validation fraction | | `--output_dir` | `finetune/data` | Output directory | | `--seed` | `42` | Random seed | Expected output: ``` finetune/data/ tokenizer.json ← extended tokenizer (32,002 vocab) tokenizer_config.json special_tokens_map.json train_sft.pt ← ~76,000 examples val_sft.pt ← ~4,000 examples meta.json ← stats ``` ## Step 3 — Fine-tune ```bash python finetune/sft_train.py \ --base_ckpt runs/sllm_150m/ckpt_0011500.pt \ --run_dir runs/sllm_150m_chat \ --max_steps 2000 \ --batch_size 4 --grad_accum 8 \ --grad_checkpoint ``` For an RTX 3050 4 GB, these settings use ~3.5 GB VRAM and take **~5–8 minutes**. **Resume training:** ```bash python finetune/sft_train.py \ --resume --run_dir runs/sllm_150m_chat \ --extra_steps 1000 ``` Key options: | Flag | Default | Description | |------|---------|-------------| | `--base_ckpt` | `runs/sllm_150m/ckpt_0011500.pt` | Base pretrained checkpoint | | `--max_lr` | `1e-5` | Peak LR (10× lower than pretraining) | | `--dropout` | `0.1` | SFT dropout (0 in pretraining) | | `--max_steps` | `2000` | Total training steps | | `--grad_checkpoint` | off | Enable for lower VRAM | Checkpoints are saved to `runs/sllm_150m_chat/ckpt_sft_XXXXXXX.pt`. Training log: `runs/sllm_150m_chat/sft_log.jsonl`. ## Step 4 — Chat ```bash python finetune/chat.py python finetune/chat.py --run_dir runs/sllm_150m_chat --temperature 0.7 ``` In-chat commands: | Command | Effect | |---------|--------| | `/reset` | Clear conversation history | | `/system ` | Change system prompt | | `/quit` | Exit | ## What changes vs pretraining | | Pretraining (`train.py`) | SFT (`sft_train.py`) | |---|---|---| | Data | Raw text shards (`.bin`) | ChatML conversations (`.pt`) | | Loss | Every token | **Assistant tokens only** (`ignore_index=-100`) | | Learning rate | `3e-4` | **`1e-5`** | | Warmup | 100 steps | 30 steps | | Vocab | 32,000 | **32,002** (`<\|im_start\|>` + `<\|im_end\|>`) | | Dropout | 0.0 | **0.1** | | Checkpoint prefix | `ckpt_` | `ckpt_sft_` | ## Expected loss curve | Stage | Expected loss | |-------|--------------| | Start (step 0) | 1.5 – 2.5 | | Step 500 | 1.0 – 1.5 | | Step 2000 | 0.8 – 1.2 | > **If loss starts above 4.0 or goes NaN** → reduce `--max_lr` to `5e-6`. ## Prompt format (ChatML) ``` <|im_start|>system You are a helpful, concise assistant.<|im_end|> <|im_start|>user What is the capital of France?<|im_end|> <|im_start|>assistant The capital of France is Paris.<|im_end|> ``` Generation stops automatically when the model produces `<|im_end|>`.