GLiNER-PII ONNX Model
This is an ONNX-converted version of nvidia/gliner-pii for efficient CPU inference.
Model Description
GLiNER-PII is a Named Entity Recognition (NER) model specifically trained for detecting Personally Identifiable Information (PII) and Protected Health Information (PHI) in text. This ONNX version provides 1.5-2x faster CPU inference compared to the PyTorch version while maintaining the same accuracy.
Why ONNX?
- โ Faster inference: 1.5-2x speedup on CPU
- โ Cross-platform: Works on any OS with ONNX Runtime
- โ Production-ready: Optimized for deployment
Files
model.onnx: Standard ONNX model (~1.8GB)config.json: Model configurationtokenizer.json: Tokenizer configurationpytorch_model.bin: Original PyTorch weights (for reference)
Note on Quantization: We initially provided a quantized (INT8) version, but it had significant accuracy issues due to the model's LSTM architecture and dynamic tensor operations. For production use, we recommend the standard ONNX model which provides good performance without sacrificing accuracy.
Usage
from gliner import GLiNER
# Load the ONNX model
model = GLiNER.from_pretrained(
"ineersa/gliner-PII-onnx",
load_onnx_model=True,
load_tokenizer=True
)
# Predict entities
text = "My name is John Doe and my email is john@example.com"
labels = ["name", "email"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(f"{entity['text']} => {entity['label']} ({entity['score']:.3f})")
# Output:
# John Doe => name (0.997)
# john@example.com => email (1.000)
Supported PII Labels (64 types)
The model can detect the following PII/PHI entity types, organized by category:
Personal Information (13 types)
first_name,last_name,namedate_of_birth,age,gendersexuality,race_ethnicityreligious_belief,political_viewoccupation,employment_status,education_level
Contact Information (10 types)
email,phone_numberstreet_address,city,county,state,countrycoordinate,zip_code,po_box
Financial Information (10 types)
credit_debit_card,cvvbank_routing_number,account_numberiban,swift_bic,pinssn,tax_id,ein
Government Identifiers (5 types)
passport_number,driver_licenselicense_plate,national_id,voter_id
Digital/Technical Identifiers (11 types)
ipv4,ipv6,mac_address,urluser_name,passworddevice_identifier,imei,serial_numberapi_key,secret_key
Healthcare/PHI (7 types)
medical_record_number,health_plan_beneficiary_numberblood_type,biometric_identifierhealth_condition,medicationinsurance_policy_number
Temporal Information (3 types)
date,time,date_time
Organization Information (5 types)
company_name,employee_id,customer_idcertificate_license_number,vehicle_identifier
Example Use Cases
- Data Privacy Compliance: Scan documents for GDPR, HIPAA, or CCPA compliance
- Data Anonymization: Identify and redact sensitive information before sharing
- Security Auditing: Detect accidental PII exposure in logs or databases
- Content Moderation: Flag user-generated content containing personal information
Citation
@misc{gliner-pii,
title={GLiNER-PII: Generalist and Lightweight Model for Named Entity Recognition},
author={NVIDIA},
year={2024},
url={https://huggingface.co/nvidia/gliner-pii}
}
License
This model is released under the NVIDIA Open Model License.
Key points:
- โ Commercial use allowed
- โ Can create and distribute derivative models
- โ NVIDIA does not claim ownership of outputs
- โ ๏ธ See full license for complete terms and conditions
Acknowledgments
- Original model: nvidia/gliner-PII
- GLiNER framework: urchade/GLiNER
- ONNX conversion: Based on GLiNER's official conversion example
- Downloads last month
- 30
Model tree for ineersa/gliner-PII-onnx
Base model
nvidia/gliner-PII