Instructions to use AREEBAFATIMA12/SmolLM-135M-SFT-DPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AREEBAFATIMA12/SmolLM-135M-SFT-DPO with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-135M") model = PeftModel.from_pretrained(base_model, "AREEBAFATIMA12/SmolLM-135M-SFT-DPO") - Notebooks
- Google Colab
- Kaggle
SmolLM-135M β SFT + DPO Fine-Tuned
A fine-tuned version of SmolLM-135M for instruction following, trained using Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) with LoRA adapters.
Model Details
- Base model: HuggingFaceTB/SmolLM-135M (135M parameters)
- Fine-tuning method: SFT (LoRA) β DPO (LoRA)
- SFT dataset: databricks/databricks-dolly-15k (6,000 samples)
- DPO dataset: Intel/orca_dpo_pairs (3,000 samples)
- Developed by: Areeba Fatima (IBA Karachi β NLP with Deep Learning, Assignment 04)
- Language: English
- License: Apache 2.0
Training Details
Best SFT Configuration (Trial 3)
- LoRA rank: 32, alpha: 64
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Learning rate: 2e-4, epochs: 2, batch size: 2
- Validation loss: 2.2008
Best DPO Configuration
- Beta: [your best beta]
- Learning rate: [your best LR]
- Epochs: [your best epochs]
- Validation loss: [your val loss]
Evaluation Results (10-prompt test set)
| Stage | Avg BLEU | Corpus BLEU | Avg BERTScore |
|---|---|---|---|
| Base (no tuning) | 0.1043 | 0.0864 | 0.7957 |
| Best SFT (Trial 3) | 0.1513 | 0.1112 | 0.8210 |
| Best DPO | 0.0264 | 0.0148 | 0.7221 |
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-135M")
tokenizer = AutoTokenizer.from_pretrained("AREEBAFATIMA12/SmolLM-135M-SFT-DPO")
model = PeftModel.from_pretrained(base, "AREEBAFATIMA12/SmolLM-135M-SFT-DPO")
model.eval()
prompt = "<|user|>\nWhat causes seasons on Earth?</s>\n<|assistant|>\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=150, do_sample=False,
pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
- Downloads last month
- 52
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for AREEBAFATIMA12/SmolLM-135M-SFT-DPO
Base model
HuggingFaceTB/SmolLM-135M