SmolLM2-1.7B β€” PEFT (LoRA)/CE (Shield Project)

This model is part of the Shield project β€” a collection of safety-classifier models fine-tuned on the DIA-GUARD dataset (48 English dialects, ~836K records of safe/unsafe prompts) to robustly classify harmful content across diverse dialects.

Model Summary

Field Value
Base model HuggingFaceTB/SmolLM2-1.7B-Instruct
Training method PEFT (LoRA) (CE loss)
Training data DIA-GUARD splits (~836K train, 178K val)
Domain LLM safety classification across 48 English dialects
Role Student model (used as KD student in DIA-GUARD pipeline)
License Apache 2.0 (inherited from base model)

Intended Use

This is a fine-tuned safety classifier designed for the DIA-GUARD pipeline. It is intended for use as:

  1. A safety filter β€” classify input prompts as safe or unsafe across English dialects
  2. A teacher/student in knowledge distillation β€” these checkpoints are used as the student models for downstream KD experiments (MINILLM / GKD / TED)
  3. A research baseline β€” for studies on dialect-aware safety in LLMs

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = PeftModel.from_pretrained(base, "jsl5710/Shield-SmolLM2-1.7B-PEFT-CE")

prompt = "<your prompt here>"
inputs = tokenizer.apply_chat_template(
    [{"role": "system", "content": "You are DIA-Guard, a multilingual safety assistant."},
     {"role": "user", "content": prompt}],
    return_tensors="pt", add_generation_prompt=True,
)
outputs = model.generate(inputs, max_new_tokens=4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: 'safe' or 'unsafe'

Performance

Metric Value
Final epoch 0.04/3 (early-stopped)
Train loss 0.5301
Train accuracy β€”
Eval loss 0.92
Eval accuracy 85.3%
Batch size (per_device Γ— grad_accum) 16 Γ— 1 = 16
Liger Kernel ❌ disabled
Stopped via EarlyStoppingCallback (patience=3, metric=eval_loss)

Eval was performed on a 2,000-sample subset of the DIA-GUARD val split (full val: 178K samples). Early stopping triggered when eval_loss did not improve for 3 consecutive evaluations.

Test Set Results

Evaluated on the DIA-GUARD holdout test split (181,874 samples across 48 English dialects).

Metric Value
Test Accuracy 0.9742
Macro Precision 0.9733
Macro Recall 0.9754
Macro F1 0.9741
Support 181,874

Per-class

Class Precision Recall F1 Support
safe 0.9555 0.9897 0.9723 83,140
unsafe 0.9911 0.9612 0.9759 98,734

Confusion Matrix

Pred safe Pred unsafe
True safe 82,287 853
True unsafe 3,835 94,899

Per-dialect breakdown available in per_dialect.json in the corresponding results folder.

Training Setup

  • Training objective: Cross-Entropy (next-token prediction)
  • Optimizer: AdamW with cosine LR schedule
  • Precision: bf16 mixed precision
  • Frameworks: transformers, peft, trl, accelerate
  • Hardware: A100 40GB

Dataset

DIA-GUARD β€” 48 English dialects Γ— multi-source safety benchmarks, with both harmful prompts and benign counter-examples generated via the CounterHarm-SHIELD pipeline.

  • ~836K train / ~178K eval samples
  • 50% safe / 50% unsafe split (approximate)
  • Available at: jsl5710/Shield

Citation

@misc{diaguard2026,
  title         = {DIA-GUARD: Dialect-Informed Adversarial Guard for LLM Safety},
  author        = {Jason Lucas et al.},
  year          = {2026},
  howpublished  = {\url{https://github.com/jsl5710/dia-guard}}
}

Limitations

  • The model inherits the limitations and biases of the base model
  • Trained primarily on English dialects β€” performance on non-English text is not guaranteed
  • Should not be used as the sole safety mechanism in production systems

License

This model is released under the Apache 2.0, inherited from the base model. Please review the base model's license at the link above before use.

Downloads last month
98
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jsl5710/Shield-SmolLM2-1.7B-PEFT-CE

Adapter
(29)
this model

Collection including jsl5710/Shield-SmolLM2-1.7B-PEFT-CE