SmolLM-135M β€” SFT + DPO Fine-Tuned

A fine-tuned version of SmolLM-135M for instruction following, trained using Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) with LoRA adapters.

Model Details

  • Base model: HuggingFaceTB/SmolLM-135M (135M parameters)
  • Fine-tuning method: SFT (LoRA) β†’ DPO (LoRA)
  • SFT dataset: databricks/databricks-dolly-15k (6,000 samples)
  • DPO dataset: Intel/orca_dpo_pairs (3,000 samples)
  • Developed by: Areeba Fatima (IBA Karachi β€” NLP with Deep Learning, Assignment 04)
  • Language: English
  • License: Apache 2.0

Training Details

Best SFT Configuration (Trial 3)

  • LoRA rank: 32, alpha: 64
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Learning rate: 2e-4, epochs: 2, batch size: 2
  • Validation loss: 2.2008

Best DPO Configuration

  • Beta: [your best beta]
  • Learning rate: [your best LR]
  • Epochs: [your best epochs]
  • Validation loss: [your val loss]

Evaluation Results (10-prompt test set)

Stage Avg BLEU Corpus BLEU Avg BERTScore
Base (no tuning) 0.1043 0.0864 0.7957
Best SFT (Trial 3) 0.1513 0.1112 0.8210
Best DPO 0.0264 0.0148 0.7221

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-135M")
tokenizer = AutoTokenizer.from_pretrained("AREEBAFATIMA12/SmolLM-135M-SFT-DPO")
model = PeftModel.from_pretrained(base, "AREEBAFATIMA12/SmolLM-135M-SFT-DPO")
model.eval()

prompt = "<|user|>\nWhat causes seasons on Earth?</s>\n<|assistant|>\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=150, do_sample=False,
                         pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Downloads last month
52
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AREEBAFATIMA12/SmolLM-135M-SFT-DPO

Adapter
(21)
this model

Space using AREEBAFATIMA12/SmolLM-135M-SFT-DPO 1