NerGuard-0.3B-onnx-int8
Quantized ONNX model for Named Entity Recognition (NER) focused on PII detection. This model is an INT8 quantized version of a fine-tuned mDeBERTa-v3-base for multilingual token classification.
Model Details
| Property |
Value |
| Base Architecture |
DebertaV2ForTokenClassification |
| Parameters |
~300M |
| Quantization |
Dynamic INT8 (QUInt8) |
| Format |
ONNX (Optimum) |
| Max Sequence Length |
512 |
Efficiency Metrics
| Metric |
Value |
| Original Size |
1.06 GB |
| Quantized Size |
323 MB |
| Compression Ratio |
3.35:1 |
| Size Reduction |
70.16% |
Performance
| Metric |
Quantized |
Retention |
| F1-Score |
- |
85.46% |
| Precision |
- |
89.19% |
| Recall |
- |
88.28% |
Supported Labels
PII entities detected: AGE, BUILDINGNUM, CITY, CREDITCARDNUMBER, DATE, DRIVERLICENSENUM, EMAIL, GENDER, GIVENNAME, IDCARDNUM, PASSPORTNUM, SEX, SOCIALNUM, STREET, SURNAME, TAXNUM, TELEPHONENUM, TIME, TITLE, ZIPCODE
Usage
import numpy as np
from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline
from pprint import pprint
model_name = "exdsgift/NerGuard-0.3B-onnx-int8"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = ORTModelForTokenClassification.from_pretrained(
model_name,
file_name="model_quantized.onnx"
)
nlp = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple"
)
multilingual_cases = [
"Please send the report to Mr. John Smith at j.smith@company.com immediately.",
"J'habite au 15 Rue de la Paix, Paris. Mon nom est Pierre Martin.",
"Mein Name ist Thomas Müller und ich lebe in der Berliner Straße 5, München.",
"La doctora Ana María González López trabaja en el Hospital Central de Madrid.",
"Il codice fiscale di Mario Rossi è RSSMRA80A01H501U.",
"Ik ben Sven van der Berg en mijn e-mailadres is sven.berg@example.nl."
]
for text in multilingual_cases:
results = nlp(text)
print(f"\n--- Sample: {text} ---")
pprint(results)