siberiancat commited on
Commit
ac53ca6
·
verified ·
1 Parent(s): 624bd5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -4,15 +4,83 @@ tags:
4
  - ner
5
  - pii
6
  - security
 
 
 
7
  language:
8
  - en
9
  base_model:
10
  - microsoft/deberta-v3-large
11
  pipeline_tag: token-classification
 
12
  ---
13
 
 
14
 
15
- Dymium Named Entity Recognition (NER) for PII detection.
 
16
 
17
- 13 entities:
18
- ADDRESS, AUTH, DATE, EMAIL, ID_DOC, ID_FIN, ID_GOV, ID_REF, ORG, PERSON, PHONE, TIME, URL
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - ner
5
  - pii
6
  - security
7
+ - pii-detection
8
+ - data-privacy
9
+ - compliance
10
  language:
11
  - en
12
  base_model:
13
  - microsoft/deberta-v3-large
14
  pipeline_tag: token-classification
15
+ license: apache-2.0
16
  ---
17
 
18
+ # Dymium PII Named Entity Recognition
19
 
20
+ Fine-tuned [DeBERTa-v3-large](https://huggingface.co/microsoft/deberta-v3-large)
21
+ for high-accuracy PII detection in enterprise and AI pipeline contexts.
22
 
23
+ Developed by [Dymium](https://dymium.io) — AI data security platform enabling
24
+ zero-copy data access with built-in governance and compliance.
25
+
26
+ ## Entities
27
+
28
+ 13 PII entity types:
29
+
30
+ | Entity | Description |
31
+ |--------|-------------|
32
+ | ADDRESS | Physical addresses |
33
+ | AUTH | Credentials, passwords, API keys |
34
+ | DATE | Dates of birth and other sensitive dates |
35
+ | EMAIL | Email addresses |
36
+ | ID_DOC | Passport, driver's license numbers |
37
+ | ID_FIN | Financial identifiers (account, card numbers) |
38
+ | ID_GOV | Government identifiers (SSN, tax IDs) |
39
+ | ID_REF | Internal reference identifiers |
40
+ | ORG | Organization names |
41
+ | PERSON | Personal names |
42
+ | PHONE | Phone numbers |
43
+ | TIME | Timestamps |
44
+ | URL | URLs |
45
+
46
+ ## Intended Use
47
+
48
+ - PII detection and redaction in AI pipelines
49
+ - Data governance and compliance enforcement (GDPR, HIPAA, FedRAMP)
50
+ - Sensitive data discovery before feeding to LLMs
51
+ - Real-time data access control
52
+
53
+ ## Usage
54
+ ```python
55
+ from transformers import pipeline
56
+
57
+ ner = pipeline("token-classification",
58
+ model="dymium/dymium-pii-ner",
59
+ aggregation_strategy="simple")
60
+
61
+ result = ner("Contact John Smith at john@example.com or call 555-123-4567")
62
+ print(result)
63
+ ```
64
+
65
+ ## Performance
66
+
67
+ | Metric | Score |
68
+ |--------|-------|
69
+ | F1 | [add] |
70
+ | Precision | [add] |
71
+ | Recall | [add] |
72
+
73
+ ## Limitations
74
+
75
+ - English language only
76
+ - Performance may vary on highly domain-specific text
77
+ - AUTH entity detection depends on context availability
78
+
79
+ ## About Dymium
80
+
81
+ Dymium eliminates data movement risk by enabling AI agents and analytics
82
+ tools to query sensitive data in place — no copying, full governance,
83
+ FedRAMP-ready.
84
+
85
+ 🔗 [dymium.io](https://dymium.io) | [Blog](https://dymium.io/blog) |
86
+ [Resources](https://dymium.io/resources)