| | --- |
| | license: mit |
| | language: |
| | - en |
| | tags: |
| | - bert |
| | - pii-detection |
| | - phi-detection |
| | - hipaa |
| | - healthcare |
| | - nlp |
| | - text-classification |
| | - sequence-classification |
| | - lora |
| | - peft |
| | datasets: |
| | - custom |
| | base_model: bert-base-uncased |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | --- |
| | |
| | # HIPAA-BERT: PII/PHI Column Name Classifier |
| |
|
| | A fine-tuned BERT model for classifying database column names as **PII** (Personally Identifiable Information), **PHI** (Protected Health Information), or **Other (O)**. |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | **Developer** | KronosX AI Labs | |
| | | **Model Type** | BERT + LoRA (text classification) | |
| | | **Base Model** | `bert-base-uncased` | |
| | | **Language** | English | |
| | | **Fine-tuning Method** | LoRA (Low-Rank Adaptation) | |
| | | **Task** | Sequence Classification (3 classes) | |
| |
|
| | ## Labels |
| |
|
| | | Label | Description | Examples | |
| | |-------|-------------|----------| |
| | | `O` | Other/Safe columns | `id`, `created_at`, `status` | |
| | | PII | Personally Identifiable Info | `email`, `phone_number`, `address` | |
| | | PHI | Protected Health Info (HIPAA) | `diagnosis_code`, `patient_name`, `ssn` | |
| |
|
| | ## Training Details |
| |
|
| | ### Hyperparameters |
| |
|
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Learning Rate | 1e-3 | |
| | | Batch Size | 64 | |
| | | Epochs | 10 | |
| | | Weight Decay | 0.01 | |
| | | Max Sequence Length | 64 | |
| | | LoRA Rank (r) | 16 | |
| | | LoRA Alpha | 32 | |
| | | LoRA Dropout | 0.1 | |
| | | Target Modules | query, value | |
| |
|
| | ### Training Data |
| | Custom HIPAA-compliant dataset with ~50000+ labeled column names from healthcare databases. |
| |
|
| | ### Hardware |
| | - GPU: NVIDIA GPU (Kaggle) |
| | - Mixed Precision: FP16 enabled |
| |
|
| | ## Performance Metrics |
| |
|
| | | Metric | Score | |
| | |--------|-------| |
| | | Accuracy | ~95%+ | |
| | | F1 (weighted) | ~94%+ | |
| | | Precision | ~93%+ | |
| | | Recall | ~94%+ | |
| |
|
| | ## Usage |
| |
|
| | ### Installation |
| | pip install transformers torch |
| |
|
| | ### Quick Start |
| |
|
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| |
|
| | # Load model |
| | model_name = "KronosXAI/HIPAA-BERT-v0.1" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| | |
| | # Classify column names |
| | columns = ["patient_name", "diagnosis_code", "created_at", "email", "status"] |
| | for col in columns: |
| | inputs = tokenizer(col, return_tensors="pt", truncation=True, max_length=64) |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | prediction = torch.argmax(outputs.logits, dim=-1).item() |
| | |
| | label_map = {0: "O", 1: "PII", 2: "PHI"} |
| | print(f"{col}: {label_map[prediction]}") |
| | |
| | ### Expected Output |
| | patient_name: PHI |
| | diagnosis_code: PHI |
| | created_at: O |
| | email: PII |
| | status: O |
| | |
| | ## Intended Use |
| | |
| | ### Primary Use Cases |
| | * Automatic PII/PHI detection in database schemas |
| | * Data privacy compliance audits |
| | * HIPAA compliance automation |
| | * Healthcare data anonymization pipelines |
| | |
| | ### Out-of-Scope |
| | * This model classifies column names, not the actual data content |
| | * Not suitable for classifying free-text or unstructured data |
| | * Should be used as part of a larger compliance workflow, not as sole arbiter |
| | |
| | ## Limitations & Bias |
| | * Trained primarily on English column naming conventions |
| | * May not generalize to non-standard or domain-specific naming patterns |
| | * Should be validated with domain experts before production use |
| | |
| | ## Model Card Authors |
| | Abishek - KronosX AI Labs |
| | |
| | ## Citation |
| | @misc{hipaa-bert-2024, |
| | author = {KronosX AI Labs}, |
| | title = {HIPAA-BERT: PII/PHI Column Name Classifier}, |
| | year = {2026}, |
| | url = {https://huggingface.co/KronosXAI/HIPAA-BERT-v0.1} |
| | } |
| | |
| | ## Links |
| | * Organization: KronosX AI Labs |
| | |