distil-labs
/

Distil-PII-SmolLM2-135M-Instruct

Text Generation

Model card Files Files and versions

Distil-PII-SmolLM2-135M-Instruct / README.md

distillabs's picture

Upload folder using huggingface_hub

eff5041 verified 3 months ago

|

history blame contribute delete

2.22 kB

	---

	license: apache-2.0
	language: en
	base_model: HuggingFaceTB/SmolLM2-135M-Instruct
	pipeline_tag: text-generation
	tags: [pii-redaction, privacy, slm, distil-labs]
	---

	# Distil-PII-SmolLM2-135M-Instruct

	A small language model (SLM) fine-tuned by Distil Labs for policy-aware PII redaction that outputs a single JSON object with `redacted_text` and `entities`. Optimized to run locally with strong accuracy and strict schema adherence.

	## Model Details

	* Developed by: Distil Labs GmbH
	* License: Apache 2
	* Finetuned from: HuggingFaceTB/SmolLM2-135M-Instruct

	## Intended Use & Limitations

	* Use cases: Redacting support chats, logs, tickets, transcripts—removing identity while preserving ops signals (IDs last-4, order numbers, etc.).
	* Out of scope: Legal or compliance advice; languages beyond English (generalization not guaranteed); domain-specific IDs unseen in training.

	## Input & Output

	Input: A plain-text prompt with task instruction + context.
	Output (JSON only):

	```json
	{
	"redacted_text": "Text with in-place tokens",
	"entities": [
	{"value": "<original>", "replacement_token": "[TOKEN]", "reason": "<why>"}
	]
	}
	```

	Tokens: `[PERSON] [EMAIL] [PHONE] [ADDRESS] [SSN] [ID] [UUID] [CARD_LAST4:####] [IBAN_LAST4:####] [GENDER] [AGE] [RACE] [MARITAL_STATUS]`

	## Training

	Instruction-tuned on a compact policy spec + ~20 curated examples emphasizing exact JSON schema, minimal in-place edits, and entity correctness.

	## Evaluation

	Judged by a frontier LLM using a deterministic rubric: JSON-only, schema validity, redacted_text exact match, and set-equality of `(value, replacement_token)` pairs (reason/order ignored). Score: 0.25 +/- 0.05.

	## How to Use
	Details of deployment can be found in https://docs.distillabs.ai/how-to/model-deployment


	## Risks & Mitigations

	* False negatives/positives: May miss novel formats or over-redact generic terms. Mitigate via guardrails + post-validation.
	* Policy drift: Keep task preamble fixed; monitor with unit tests.

	## Model Sources

	* Homepage: [https://distillabs.ai](https://distillabs.ai)
	* Contact: [contact@distillabs.ai](mailto:contact@distillabs.ai)