SmolLM2-1.7B β PEFT (LoRA)/CE (Shield Project)
This model is part of the Shield project β a collection of safety-classifier models fine-tuned on the DIA-GUARD dataset (48 English dialects, ~836K records of safe/unsafe prompts) to robustly classify harmful content across diverse dialects.
Model Summary
| Field | Value |
|---|---|
| Base model | HuggingFaceTB/SmolLM2-1.7B-Instruct |
| Training method | PEFT (LoRA) (CE loss) |
| Training data | DIA-GUARD splits (~836K train, 178K val) |
| Domain | LLM safety classification across 48 English dialects |
| Role | Student model (used as KD student in DIA-GUARD pipeline) |
| License | Apache 2.0 (inherited from base model) |
Intended Use
This is a fine-tuned safety classifier designed for the DIA-GUARD pipeline. It is intended for use as:
- A safety filter β classify input prompts as
safeorunsafeacross English dialects - A teacher/student in knowledge distillation β these checkpoints are used as the student models for downstream KD experiments (MINILLM / GKD / TED)
- A research baseline β for studies on dialect-aware safety in LLMs
How to use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = PeftModel.from_pretrained(base, "jsl5710/Shield-SmolLM2-1.7B-PEFT-CE")
prompt = "<your prompt here>"
inputs = tokenizer.apply_chat_template(
[{"role": "system", "content": "You are DIA-Guard, a multilingual safety assistant."},
{"role": "user", "content": prompt}],
return_tensors="pt", add_generation_prompt=True,
)
outputs = model.generate(inputs, max_new_tokens=4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: 'safe' or 'unsafe'
Performance
| Metric | Value |
|---|---|
| Final epoch | 0.04/3 (early-stopped) |
| Train loss | 0.5301 |
| Train accuracy | β |
| Eval loss | 0.92 |
| Eval accuracy | 85.3% |
| Batch size (per_device Γ grad_accum) | 16 Γ 1 = 16 |
| Liger Kernel | β disabled |
| Stopped via | EarlyStoppingCallback (patience=3, metric=eval_loss) |
Eval was performed on a 2,000-sample subset of the DIA-GUARD val split (full val: 178K samples). Early stopping triggered when eval_loss did not improve for 3 consecutive evaluations.
Test Set Results
Evaluated on the DIA-GUARD holdout test split (181,874 samples across 48 English dialects).
| Metric | Value |
|---|---|
| Test Accuracy | 0.9742 |
| Macro Precision | 0.9733 |
| Macro Recall | 0.9754 |
| Macro F1 | 0.9741 |
| Support | 181,874 |
Per-class
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| safe | 0.9555 | 0.9897 | 0.9723 | 83,140 |
| unsafe | 0.9911 | 0.9612 | 0.9759 | 98,734 |
Confusion Matrix
| Pred safe | Pred unsafe | |
|---|---|---|
| True safe | 82,287 | 853 |
| True unsafe | 3,835 | 94,899 |
Per-dialect breakdown available in
per_dialect.jsonin the corresponding results folder.
Training Setup
- Training objective: Cross-Entropy (next-token prediction)
- Optimizer: AdamW with cosine LR schedule
- Precision: bf16 mixed precision
- Frameworks: transformers, peft, trl, accelerate
- Hardware: A100 40GB
Dataset
DIA-GUARD β 48 English dialects Γ multi-source safety benchmarks, with both harmful prompts and benign counter-examples generated via the CounterHarm-SHIELD pipeline.
- ~836K train / ~178K eval samples
- 50% safe / 50% unsafe split (approximate)
- Available at:
jsl5710/Shield
Citation
@misc{diaguard2026,
title = {DIA-GUARD: Dialect-Informed Adversarial Guard for LLM Safety},
author = {Jason Lucas et al.},
year = {2026},
howpublished = {\url{https://github.com/jsl5710/dia-guard}}
}
Limitations
- The model inherits the limitations and biases of the base model
- Trained primarily on English dialects β performance on non-English text is not guaranteed
- Should not be used as the sole safety mechanism in production systems
License
This model is released under the Apache 2.0, inherited from the base model. Please review the base model's license at the link above before use.
- Downloads last month
- 98
Model tree for jsl5710/Shield-SmolLM2-1.7B-PEFT-CE
Base model
HuggingFaceTB/SmolLM2-1.7B