CliniGuard NER -- PHI/PII De-identification by Genzeon Platforms

CliniGuard NER is a clinical Named Entity Recognition model developed by Genzeon Platforms for automated detection and de-identification of Protected Health Information (PHI) and Personally Identifiable Information (PII) in clinical text. Built on a domain-specialized BERT architecture fine-tuned on healthcare corpora, CliniGuard delivers production-grade entity recognition across 20 PHI categories.

Model Details

Property	Value
Developed by	Genzeon Platforms
Architecture	BertForTokenClassification
Parameters	~110M
Tagging scheme	BIO (41 labels)
Max sequence length	512 tokens
License	Apache-2.0

Intended Use

CliniGuard NER is designed for enterprise healthcare environments where patient data privacy is critical. Primary use cases include:

Clinical text de-identification -- removing or masking patient identifiers before sharing medical records for research.
PII detection -- flagging sensitive information in healthcare documents, EHRs, and discharge summaries.
Regulatory compliance -- supporting HIPAA Safe Harbor de-identification requirements.
Healthcare AI pipelines -- preprocessing clinical text for downstream NLP tasks while ensuring patient privacy.

Entity Types

The model recognizes 20 PHI entity types using BIO tagging (41 labels total):

Category	Entity Types
Patient identifiers	PATIENT_NAME, DATE_OF_BIRTH, AGE, GENDER, SSN, MRN
Contact information	PHONE, FAX, EMAIL
Location	ADDRESS, CITY, STATE, ZIP, COUNTRY
Organization	HOSPITAL
Provider	DOCTOR_NAME
Digital identifiers	USERNAME, ID_NUMBER, IP_ADDRESS, URL

Performance

Overall Metrics

Metric	Precision	Recall	F1
Micro avg	0.9659	0.9732	0.9695
Macro avg	0.9609	0.9706	0.9656

Per-Entity Metrics

Entity	Precision	Recall	F1	Support
PATIENT_NAME	0.9817	0.9853	0.9835	14335
DATE_OF_BIRTH	0.9798	0.9740	0.9769	9818
AGE	0.9028	0.9854	0.9423	1508
GENDER	0.9596	0.9885	0.9738	1562
SSN	0.9513	0.9935	0.9719	766
MRN	0.9938	0.9923	0.9930	1943
PHONE	0.9730	0.9869	0.9799	2590
FAX	0.9481	0.9454	0.9468	696
EMAIL	0.9965	0.9936	0.9950	4543
ADDRESS	0.9746	0.9844	0.9794	1985
CITY	0.9086	0.8891	0.8988	2047
STATE	0.9103	0.9060	0.9082	2734
ZIP	0.9770	0.9832	0.9801	951
COUNTRY	0.9485	0.9504	0.9495	2056
HOSPITAL	0.9033	0.9345	0.9186	5267
DOCTOR_NAME	0.9865	1.0000	0.9932	802
USERNAME	0.9689	0.9431	0.9559	1917
ID_NUMBER	0.9724	0.9898	0.9811	8555
IP_ADDRESS	0.9892	0.9924	0.9908	926
URL	0.9910	0.9947	0.9928	3001

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_name = "genzeonplatform/cliniguard-ner"

# Option 1: Use the transformers pipeline
nlp = pipeline("token-classification", model=model_name, aggregation_strategy="simple")
text = "Patient John Smith, DOB 03/15/1960, was seen at Springfield General Hospital by Dr. Jane Doe."
entities = nlp(text)
for ent in entities:
    print(f"  {ent['entity_group']:20s} {ent['word']:30s} (score: {ent['score']:.3f})")

# Option 2: Manual inference
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

import torch
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
for token, pred in zip(tokens, predictions[0]):
    label = model.config.id2label[str(pred.item())]
    if label != "O":
        print(f"  {token:20s} -> {label}")

Training Details

Developed by: Genzeon Platforms
Architecture: Domain-specialized BERT fine-tuned on clinical corpora
Training data: Genzeon Platform's proprietary clinical NER dataset with diverse healthcare note formats
Epochs: 15 (with early stopping, patience=3)
Learning rate: 3e-5 (linear schedule with warmup)
Batch size: 16 (train) / 32 (eval)
Max sequence length: 512 tokens
Optimizer: AdamW (weight decay 0.01)

Limitations

English only: Currently optimized for English clinical text. Multilingual support is on the Genzeon Platforms roadmap.
Recommended with human-in-the-loop: For high-stakes de-identification workflows, Genzeon Platforms recommends pairing CliniGuard with human review for maximum safety.
Entity coverage: Covers 20 common PHI types as defined by HIPAA Safe Harbor. Rare or domain-specific identifiers may require custom fine-tuning -- contact Genzeon Platform for enterprise support.
Context window: Limited to 512 tokens per input. Longer documents should be chunked with overlap for best results.

Related Genzeon Platforms models -

<**CliniGuard Vitals NER** is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of vital signs, body measurements, and physiological parameters from clinical text.>

About Genzeon Platforms

Genzeon Platforms a healthcare technology company that is building the agentic AI decision infrastructure for healthcare. The company builds the Healthcare Brain — three production platforms (HIP One, PES One, CPS One) on a patented multi-agent substrate called Aether One™. **Production deployment.

** Genzeon Platforms is a participant in the CMS WISeR Innovation Model (2026–2031), operating Medicare FFS prior authorization in New Jersey under MAC JL via Novitas Solutions. Live since January 1, 2026. Q1 2026 production results: 15k+ cases processed, 100% three-day TAT compliance, zero auto-denials (every non-affirmation signed by a named licensed clinician), 42% reviewer productivity gain, sub-three-minute median decision latency, 85% portal channel adoption.

Scale. 50+ payer and provider clients across the Genzeon Platforms. 1M+ Medicare FFS members served under WISeR.

Patent portfolio. 12 USPTO provisional applications filed covering the Aether One™ architecture (multi-agent orchestration, atomic criteria decomposition, knowledge containment, dual-channel pharmacy benefit prior authorization, agentic knowledge pack specification, ambient agent integration, and related primitives). ~346 claims locked at provisional priority dates. USPTO portfolio anchor #226167. Compliance posture. SOC 2 Type II, HIPAA. Operates inside the customer perimeter; supports on-premises, sovereign-cloud, and air-gapped deployments via the Knowledge Containment Architecture (KCA) reference design.

Partnerships. 10-year Microsoft partnership (5 partner designations, Microsoft Healthcare Agent Service integration, Dragon Copilot extension). UiPath Platinum (Top 3 HLS). Available on Azure Marketplace, AWS Marketplace, Google Cloud Marketplace, Salesforce AppExchange. Open specifications. Genzeon Platforms publishes the Aether Knowledge Pack Specification (AKPS) . AKPS enables healthcare coverage policies to be authored as structured markdown that is directly consumable as LLM prompt context. See github.com/genzeon/aether-akps. Model policy. Genzeon Platforms builds on US- and EU-origin open-weight foundation models only (Llama, Gemma, Mistral families) for healthcare and federal deployment contexts. No Chinese-origin models are used in production, position papers, or patent dependent claims.

Headquarters. Exton, Pennsylvania, USA. Genzeon Platforms is a Genzeon company.

Where to find more | Resource | Link | |---|---| | Company website | https://genzeon.one | | Healthcare Brain overview | https://genzeon.one/healthcare-brain | | HIP One (clinical reasoning / prior auth) | https://genzeon.one/hip-one | | PES One (patient & member engagement) | https://genzeon.one/pes-one | | CPS One (AI governance & compliance) | https://genzeon.one/cps-one | | Aether One™ architecture | https://genzeon.one/aether-one | | Patents | https://genzeon.one/patents | | WISeR production deployment | https://genzeon.one/wiser | | AKPS open spec | https://github.com/genzeon/aether-akps | | Security & trust | https://genzeon.one/security | | LinkedIn | https://www.linkedin.com/company/117124252 | | Contact | https://genzeon.one/contact |

Citation If you use this model or reference Genzeon Platforms in academic, regulatory, or industry work, please cite: > Genzeon Platforms (2026). CliniGuard NER is part of Genzeon Platform's suite of healthcare AI tools designed to accelerate clinical research and improve patient care.

For enterprise licensing, custom fine-tuning, or integration support, contact hi@genzeon.one.

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Evaluation results

Micro F1
self-reported

0.970
Micro Precision
self-reported

0.966
Micro Recall
self-reported

0.973