GLiNER-PII ONNX Model

This is an ONNX-converted version of nvidia/gliner-pii for efficient CPU inference.

Model Description

GLiNER-PII is a Named Entity Recognition (NER) model specifically trained for detecting Personally Identifiable Information (PII) and Protected Health Information (PHI) in text. This ONNX version provides 1.5-2x faster CPU inference compared to the PyTorch version while maintaining the same accuracy.

Why ONNX?

  • โœ… Faster inference: 1.5-2x speedup on CPU
  • โœ… Cross-platform: Works on any OS with ONNX Runtime
  • โœ… Production-ready: Optimized for deployment

Files

  • model.onnx: Standard ONNX model (~1.8GB)
  • config.json: Model configuration
  • tokenizer.json: Tokenizer configuration
  • pytorch_model.bin: Original PyTorch weights (for reference)

Note on Quantization: We initially provided a quantized (INT8) version, but it had significant accuracy issues due to the model's LSTM architecture and dynamic tensor operations. For production use, we recommend the standard ONNX model which provides good performance without sacrificing accuracy.

Usage

from gliner import GLiNER

# Load the ONNX model
model = GLiNER.from_pretrained(
    "ineersa/gliner-PII-onnx",
    load_onnx_model=True,
    load_tokenizer=True
)

# Predict entities
text = "My name is John Doe and my email is john@example.com"
labels = ["name", "email"]
entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(f"{entity['text']} => {entity['label']} ({entity['score']:.3f})")
# Output:
# John Doe => name (0.997)
# john@example.com => email (1.000)

Supported PII Labels (64 types)

The model can detect the following PII/PHI entity types, organized by category:

Personal Information (13 types)

  • first_name, last_name, name
  • date_of_birth, age, gender
  • sexuality, race_ethnicity
  • religious_belief, political_view
  • occupation, employment_status, education_level

Contact Information (10 types)

  • email, phone_number
  • street_address, city, county, state, country
  • coordinate, zip_code, po_box

Financial Information (10 types)

  • credit_debit_card, cvv
  • bank_routing_number, account_number
  • iban, swift_bic, pin
  • ssn, tax_id, ein

Government Identifiers (5 types)

  • passport_number, driver_license
  • license_plate, national_id, voter_id

Digital/Technical Identifiers (11 types)

  • ipv4, ipv6, mac_address, url
  • user_name, password
  • device_identifier, imei, serial_number
  • api_key, secret_key

Healthcare/PHI (7 types)

  • medical_record_number, health_plan_beneficiary_number
  • blood_type, biometric_identifier
  • health_condition, medication
  • insurance_policy_number

Temporal Information (3 types)

  • date, time, date_time

Organization Information (5 types)

  • company_name, employee_id, customer_id
  • certificate_license_number, vehicle_identifier

Example Use Cases

  • Data Privacy Compliance: Scan documents for GDPR, HIPAA, or CCPA compliance
  • Data Anonymization: Identify and redact sensitive information before sharing
  • Security Auditing: Detect accidental PII exposure in logs or databases
  • Content Moderation: Flag user-generated content containing personal information

Citation

@misc{gliner-pii,
  title={GLiNER-PII: Generalist and Lightweight Model for Named Entity Recognition},
  author={NVIDIA},
  year={2024},
  url={https://huggingface.co/nvidia/gliner-pii}
}

License

This model is released under the NVIDIA Open Model License.

Key points:

  • โœ… Commercial use allowed
  • โœ… Can create and distribute derivative models
  • โœ… NVIDIA does not claim ownership of outputs
  • โš ๏ธ See full license for complete terms and conditions

Acknowledgments

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ineersa/gliner-PII-onnx

Base model

nvidia/gliner-PII
Quantized
(1)
this model