mkocher commited on
Commit
6b1e754
·
verified ·
1 Parent(s): 25911d0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - token-classification
6
+ - ner
7
+ - hipaa
8
+ - phi
9
+ - healthcare
10
+ - privacy
11
+ - distilbert
12
+ datasets:
13
+ - custom
14
+ pipeline_tag: token-classification
15
+ ---
16
+
17
+ # HIPAA PHI Detector (DistilBERT)
18
+
19
+ A fine-tuned DistilBERT model for detecting Protected Health Information (PHI) in text, covering all 18 HIPAA Safe Harbor categories.
20
+
21
+ ## Model Details
22
+
23
+ - **Architecture**: DistilBERT (66M params) with token classification head
24
+ - **Training**: Fine-tuned on 5,000+ synthetic HIPAA examples
25
+ - **Labels**: 37 BIO labels (18 entity types x 2 + O)
26
+ - **Framework**: PyTorch / HuggingFace Transformers
27
+
28
+ ## Supported Entity Types
29
+
30
+ | Label | HIPAA Category |
31
+ |-------|---------------|
32
+ | NAME | Names |
33
+ | LOCATION | Geographic subdivisions |
34
+ | DATE | Dates |
35
+ | PHONE | Phone numbers |
36
+ | FAX | Fax numbers |
37
+ | EMAIL | Email addresses |
38
+ | SSN | Social Security numbers |
39
+ | MRN | Medical record numbers |
40
+ | HEALTH_PLAN | Health plan beneficiary numbers |
41
+ | ACCOUNT | Account numbers |
42
+ | LICENSE | Certificate/license numbers |
43
+ | VEHICLE | Vehicle identifiers |
44
+ | DEVICE | Device identifiers |
45
+ | URL | Web URLs |
46
+ | IP | IP addresses |
47
+ | BIOMETRIC | Biometric identifiers |
48
+ | PHOTO | Photographic images |
49
+ | OTHER | Any other unique identifying number |
50
+
51
+ ## Usage
52
+
53
+ ```python
54
+ from transformers import pipeline
55
+
56
+ pipe = pipeline("token-classification", model="mkocher/hipaa-phi-detector", aggregation_strategy="simple")
57
+ results = pipe("Patient John Smith, SSN 123-45-6789")
58
+ ```
59
+
60
+ Or with the `aare-core` package:
61
+
62
+ ```python
63
+ from aare import HIPAAGuardrail
64
+
65
+ guardrail = HIPAAGuardrail()
66
+ result = guardrail.check("Patient John Smith, SSN 123-45-6789")
67
+ if result.blocked:
68
+ print(f"PHI detected: {result.violations}")
69
+ ```
70
+
71
+ ## License
72
+
73
+ Apache 2.0