agentlans
/

multilingual-e5-small-pii-detector

Text Classification

Model card Files Files and versions

agentlans commited on Nov 30, 2025

Commit

b82dc64

·

verified ·

1 Parent(s): 08a1d7d

Update README.md

Files changed (1) hide show

README.md +38 -24

README.md CHANGED Viewed

@@ -13,15 +13,29 @@ tags:
 ---
 # E5 Small Multilingual PII Detector
-This model detects personal identifying information (PII) in multilingual text.
-It's finetuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small)
-on the [agentlans/personal-information-prompts](https://huggingface.co/datasets/agentlans/personal-information-prompts) dataset
 It achieves the following results on the evaluation set:
-- Loss: 0.2192
-- Accuracy: 0.9214
-- Num Input Tokens Seen: 4552704
 <details>
   <summary>Translated testing text</summary>
@@ -194,29 +208,29 @@ Classification results for identical texts translated into different languages
 ## Limitations
-- Lack of sensitivity: the model can fail at identifying PII for certain languages and inputs (for example, credit card details)
-- May not be accurate for short texts
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 3.0
 ### Framework versions
-- Transformers 5.0.0.dev0
-- Pytorch 2.9.1+cu128
-- Datasets 4.4.1
-- Tokenizers 0.22.1
 ## Licence
-Apache 2.0

 ---
 # E5 Small Multilingual PII Detector
+A lightweight multilingual model for detecting personally identifiable information (PII) in text.
 It achieves the following results on the evaluation set:
+- Loss: 0.2192
+- Accuracy: 0.9214
+- Input tokens seen during training: 4&thinsp;552&thinsp;704
+## Usage
+```python
+from transformers import pipeline
+classifier = pipeline(
+    task="text-classification",
+    model="myusername/multilingual-e5-small-pii-detector"
+)
+classifier("Your text here.")
+# [{'label': 'False', 'score': 0.9981884360313416}]
+```
+## Results
 <details>
   <summary>Translated testing text</summary>
 ## Limitations
+- Limited sensitivity for some languages and PII formats (for example, certain credit card number patterns or locale-specific identifiers).
+- May perform poorly on very short texts that lack sufficient context.
+- Not a drop-in replacement for legal or compliance review; should be used as an assistive tool.
+## Training
+### Hyperparameters
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- optimizer: `AdamW` (fused) with `betas=(0.9, 0.999)`, `eps=1e-08`, no additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 3.0
 ### Framework versions
+- Transformers 5.0.0.dev0
+- PyTorch 2.9.1+cu128
+- Datasets 4.4.1
+- Tokenizers 0.22.1
 ## Licence
+Apache-2.0