sllm / finetune /README.md
geeteshcodes's picture
Initial commit
7f974df verified

SLLM-150M β†’ Chat Model (SFT)

Supervised Fine-Tuning pipeline to turn the pretrained SLLM-150M base model into an instruction-following chat model using OpenHermes-2.5.

Pipeline

Base model  (runs/sllm_150m/ckpt_0011500.pt)
      β”‚
      β–Ό
prepare_data.py   ─── download & tokenize OpenHermes-2.5 (80k convs)
      β”‚
      β–Ό
sft_train.py      ─── SFT with ChatML loss masking
      β”‚
      β–Ό
chat.py           ─── interactive CLI chat

Step 1 β€” Install dependency

pip install datasets

Step 2 β€” Prepare data

Downloads 80k conversations, formats as ChatML, tokenizes, saves shards. Also saves the extended tokenizer (vocab 32,002) to finetune/data/.

python finetune/prepare_data.py

Options:

Flag Default Description
--n_samples 80000 Conversations to sample
--val_ratio 0.05 Validation fraction
--output_dir finetune/data Output directory
--seed 42 Random seed

Expected output:

finetune/data/
  tokenizer.json          ← extended tokenizer (32,002 vocab)
  tokenizer_config.json
  special_tokens_map.json
  train_sft.pt            ← ~76,000 examples
  val_sft.pt              ← ~4,000 examples
  meta.json               ← stats

Step 3 β€” Fine-tune

python finetune/sft_train.py \
  --base_ckpt runs/sllm_150m/ckpt_0011500.pt \
  --run_dir   runs/sllm_150m_chat \
  --max_steps 2000 \
  --batch_size 4 --grad_accum 8 \
  --grad_checkpoint

For an RTX 3050 4 GB, these settings use 3.5 GB VRAM and take **5–8 minutes**.

Resume training:

python finetune/sft_train.py \
  --resume --run_dir runs/sllm_150m_chat \
  --extra_steps 1000

Key options:

Flag Default Description
--base_ckpt runs/sllm_150m/ckpt_0011500.pt Base pretrained checkpoint
--max_lr 1e-5 Peak LR (10Γ— lower than pretraining)
--dropout 0.1 SFT dropout (0 in pretraining)
--max_steps 2000 Total training steps
--grad_checkpoint off Enable for lower VRAM

Checkpoints are saved to runs/sllm_150m_chat/ckpt_sft_XXXXXXX.pt. Training log: runs/sllm_150m_chat/sft_log.jsonl.

Step 4 β€” Chat

python finetune/chat.py
python finetune/chat.py --run_dir runs/sllm_150m_chat --temperature 0.7

In-chat commands:

Command Effect
/reset Clear conversation history
/system <text> Change system prompt
/quit Exit

What changes vs pretraining

Pretraining (train.py) SFT (sft_train.py)
Data Raw text shards (.bin) ChatML conversations (.pt)
Loss Every token Assistant tokens only (ignore_index=-100)
Learning rate 3e-4 1e-5
Warmup 100 steps 30 steps
Vocab 32,000 32,002 (<|im_start|> + <|im_end|>)
Dropout 0.0 0.1
Checkpoint prefix ckpt_ ckpt_sft_

Expected loss curve

Stage Expected loss
Start (step 0) 1.5 – 2.5
Step 500 1.0 – 1.5
Step 2000 0.8 – 1.2

If loss starts above 4.0 or goes NaN β†’ reduce --max_lr to 5e-6.

Prompt format (ChatML)

<|im_start|>system
You are a helpful, concise assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>

Generation stops automatically when the model produces <|im_end|>.