MaziyarPanahi commited on
Commit
ade4412
·
verified ·
1 Parent(s): 0abc70a

Upload PII detection model OpenMed-PII-ClinicalLongformer-Base-149M-v1

Browse files
README.md ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ base_model: yikuan8/Clinical-Longformer
6
+ tags:
7
+ - token-classification
8
+ - ner
9
+ - pii
10
+ - pii-detection
11
+ - de-identification
12
+ - privacy
13
+ - healthcare
14
+ - medical
15
+ - clinical
16
+ - phi
17
+ - hipaa
18
+ - pytorch
19
+ - transformers
20
+ - openmed
21
+ datasets:
22
+ - nvidia/Nemotron-PII
23
+ pipeline_tag: token-classification
24
+ library_name: transformers
25
+ metrics:
26
+ - f1
27
+ - precision
28
+ - recall
29
+ model-index:
30
+ - name: OpenMed-PII-ClinicalLongformer-149M-v1
31
+ results:
32
+ - task:
33
+ type: token-classification
34
+ name: Named Entity Recognition
35
+ dataset:
36
+ name: nvidia/Nemotron-PII (test_strat)
37
+ type: nvidia/Nemotron-PII
38
+ split: test
39
+ metrics:
40
+ - type: f1
41
+ value: 0.9533
42
+ name: F1 (micro)
43
+ - type: precision
44
+ value: 0.9572
45
+ name: Precision
46
+ - type: recall
47
+ value: 0.9494
48
+ name: Recall
49
+ widget:
50
+ - text: "Dr. Sarah Johnson (SSN: 123-45-6789) can be reached at sarah.johnson@hospital.org or 555-123-4567. She lives at 123 Oak Street, Boston, MA 02108."
51
+ example_title: Clinical Note with PII
52
+ ---
53
+
54
+ # OpenMed-PII-ClinicalLongformer-149M-v1
55
+
56
+ **PII Detection Model** | 149M Parameters | Open Source
57
+
58
+ [![F1 Score](https://img.shields.io/badge/F1-95.33%25-brightgreen)]() [![Precision](https://img.shields.io/badge/Precision-95.72%25-blue)]() [![Recall](https://img.shields.io/badge/Recall-94.94%25-orange)]()
59
+
60
+ ## Model Description
61
+
62
+ **OpenMed-PII-ClinicalLongformer-149M-v1** is a transformer-based token classification model fine-tuned for **Personally Identifiable Information (PII) detection** in text. This model identifies and classifies **54 types of sensitive information** including names, addresses, SSNs, medical record numbers, and more.
63
+
64
+ ### Key Features
65
+
66
+ - **High Accuracy**: Achieves strong F1 scores across diverse PII categories
67
+ - **Comprehensive Coverage**: Detects 50+ entity types spanning personal, financial, medical, and contact information
68
+ - **Privacy-Focused**: Designed for de-identification and compliance with HIPAA, GDPR, and other privacy regulations
69
+ - **Production-Ready**: Optimized for real-world text processing pipelines
70
+
71
+ ## Performance
72
+
73
+ Evaluated on a stratified 2,000-sample test set from NVIDIA Nemotron-PII:
74
+
75
+ | Metric | Score |
76
+ |:---|:---:|
77
+ | **Micro F1** | **0.9533** |
78
+ | Precision | 0.9572 |
79
+ | Recall | 0.9494 |
80
+ | Macro F1 | 0.9528 |
81
+ | Weighted F1 | 0.9515 |
82
+ | Accuracy | 0.9943 |
83
+
84
+ ### Top 10 PII Models
85
+
86
+ | Rank | Model | F1 | Precision | Recall |
87
+ |:---:|:---|:---:|:---:|:---:|
88
+ | 1 | [OpenMed-PII-SuperClinical-Large-434M-v1](https://huggingface.co/openmed/OpenMed-PII-SuperClinical-Large-434M-v1) | 0.9608 | 0.9685 | 0.9532 |
89
+ | 2 | [OpenMed-PII-BigMed-Large-560M-v1](https://huggingface.co/openmed/OpenMed-PII-BigMed-Large-560M-v1) | 0.9604 | 0.9644 | 0.9565 |
90
+ | 3 | [OpenMed-PII-EuroMed-210M-v1](https://huggingface.co/openmed/OpenMed-PII-EuroMed-210M-v1) | 0.9600 | 0.9681 | 0.9521 |
91
+ | 4 | [OpenMed-PII-SnowflakeMed-568M-v1](https://huggingface.co/openmed/OpenMed-PII-SnowflakeMed-568M-v1) | 0.9594 | 0.9640 | 0.9548 |
92
+ | 5 | [OpenMed-PII-SuperMedical-Large-355M-v1](https://huggingface.co/openmed/OpenMed-PII-SuperMedical-Large-355M-v1) | 0.9592 | 0.9632 | 0.9553 |
93
+ | 6 | [OpenMed-PII-ClinicalBGE-568M-v1](https://huggingface.co/openmed/OpenMed-PII-ClinicalBGE-568M-v1) | 0.9587 | 0.9636 | 0.9538 |
94
+ | 7 | [OpenMed-PII-mClinicalE5-Large-560M-v1](https://huggingface.co/openmed/OpenMed-PII-mClinicalE5-Large-560M-v1) | 0.9582 | 0.9631 | 0.9533 |
95
+ | 8 | [OpenMed-PII-ModernMed-Large-395M-v1](https://huggingface.co/openmed/OpenMed-PII-ModernMed-Large-395M-v1) | 0.9579 | 0.9639 | 0.9520 |
96
+ | 9 | [OpenMed-PII-BioClinicalModern-Large-395M-v1](https://huggingface.co/openmed/OpenMed-PII-BioClinicalModern-Large-395M-v1) | 0.9579 | 0.9656 | 0.9502 |
97
+ | 10 | [OpenMed-PII-ClinicalE5-Large-335M-v1](https://huggingface.co/openmed/OpenMed-PII-ClinicalE5-Large-335M-v1) | 0.9577 | 0.9604 | 0.9550 |
98
+
99
+ ### Best Performing Entities
100
+
101
+ | Entity | F1 | Precision | Recall | Support |
102
+ |:---|:---:|:---:|:---:|:---:|
103
+ | `coordinate` | 0.997 | 0.994 | 1.000 | 174 |
104
+ | `license_plate` | 0.995 | 0.990 | 1.000 | 97 |
105
+ | `email` | 0.994 | 0.995 | 0.993 | 825 |
106
+ | `employee_id` | 0.994 | 0.994 | 0.994 | 157 |
107
+ | `ipv4` | 0.993 | 0.993 | 0.993 | 135 |
108
+
109
+ ### Challenging Entities
110
+
111
+ These entity types have lower performance and may benefit from additional post-processing:
112
+
113
+ | Entity | F1 | Precision | Recall | Support |
114
+ |:---|:---:|:---:|:---:|:---:|
115
+ | `education_level` | 0.870 | 0.886 | 0.854 | 192 |
116
+ | `fax_number` | 0.853 | 0.818 | 0.892 | 111 |
117
+ | `time` | 0.826 | 0.881 | 0.777 | 488 |
118
+ | `unique_id` | 0.769 | 0.882 | 0.682 | 44 |
119
+ | `occupation` | 0.646 | 0.734 | 0.576 | 772 |
120
+
121
+ ## Supported Entity Types
122
+
123
+ This model detects **54 PII entity types** organized into categories:
124
+
125
+ <details>
126
+ <summary><strong>Identifiers</strong> (16 types)</summary>
127
+
128
+ | Entity | Description |
129
+ |:---|:---|
130
+ | `account_number` | Account Number |
131
+ | `api_key` | Api Key |
132
+ | `bank_routing_number` | Bank Routing Number |
133
+ | `certificate_license_number` | Certificate License Number |
134
+ | `credit_debit_card` | Credit Debit Card |
135
+ | `cvv` | Cvv |
136
+ | `employee_id` | Employee Id |
137
+ | `health_plan_beneficiary_number` | Health Plan Beneficiary Number |
138
+ | `mac_address` | Mac Address |
139
+ | `medical_record_number` | Medical Record Number |
140
+ | ... | *and 6 more* |
141
+
142
+ </details>
143
+
144
+ <details>
145
+ <summary><strong>Personal Info</strong> (14 types)</summary>
146
+
147
+ | Entity | Description |
148
+ |:---|:---|
149
+ | `age` | Age |
150
+ | `biometric_identifier` | Biometric Identifier |
151
+ | `blood_type` | Blood Type |
152
+ | `date_of_birth` | Date Of Birth |
153
+ | `education_level` | Education Level |
154
+ | `first_name` | First Name |
155
+ | `last_name` | Last Name |
156
+ | `gender` | Gender |
157
+ | `language` | Language |
158
+ | `occupation` | Occupation |
159
+ | ... | *and 4 more* |
160
+
161
+ </details>
162
+
163
+ <details>
164
+ <summary><strong>Contact Info</strong> (4 types)</summary>
165
+
166
+ | Entity | Description |
167
+ |:---|:---|
168
+ | `email` | Email |
169
+ | `phone_number` | Phone Number |
170
+ | `fax_number` | Fax Number |
171
+ | `url` | Url |
172
+
173
+ </details>
174
+
175
+ <details>
176
+ <summary><strong>Location</strong> (6 types)</summary>
177
+
178
+ | Entity | Description |
179
+ |:---|:---|
180
+ | `city` | City |
181
+ | `coordinate` | Coordinate |
182
+ | `country` | Country |
183
+ | `county` | County |
184
+ | `state` | State |
185
+ | `street_address` | Street Address |
186
+
187
+ </details>
188
+
189
+ <details>
190
+ <summary><strong>Network Info</strong> (3 types)</summary>
191
+
192
+ | Entity | Description |
193
+ |:---|:---|
194
+ | `device_identifier` | Device Identifier |
195
+ | `ipv4` | Ipv4 |
196
+ | `ipv6` | Ipv6 |
197
+
198
+ </details>
199
+
200
+ <details>
201
+ <summary><strong>Temporal</strong> (3 types)</summary>
202
+
203
+ | Entity | Description |
204
+ |:---|:---|
205
+ | `date` | Date |
206
+ | `date_time` | Date Time |
207
+ | `time` | Time |
208
+
209
+ </details>
210
+
211
+ <details>
212
+ <summary><strong>Organization</strong> (1 types)</summary>
213
+
214
+ | Entity | Description |
215
+ |:---|:---|
216
+ | `company_name` | Company Name |
217
+
218
+ </details>
219
+
220
+ ## Usage
221
+
222
+ ### Quick Start
223
+
224
+ ```python
225
+ from transformers import pipeline
226
+
227
+ # Load the PII detection pipeline
228
+ ner = pipeline("ner", model="openmed/OpenMed-PII-ClinicalLongformer-149M-v1", aggregation_strategy="simple")
229
+
230
+ text = """
231
+ Patient John Smith (DOB: 03/15/1985, SSN: 123-45-6789) was seen today.
232
+ Contact: john.smith@email.com, Phone: (555) 123-4567.
233
+ Address: 456 Oak Street, Boston, MA 02108.
234
+ """
235
+
236
+ entities = ner(text)
237
+ for entity in entities:
238
+ print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")
239
+ ```
240
+
241
+ ### De-identification Example
242
+
243
+ ```python
244
+ def redact_pii(text, entities, placeholder='[REDACTED]'):
245
+ """Replace detected PII with placeholders."""
246
+ # Sort entities by start position (descending) to preserve offsets
247
+ sorted_entities = sorted(entities, key=lambda x: x['start'], reverse=True)
248
+ redacted = text
249
+ for ent in sorted_entities:
250
+ redacted = redacted[:ent['start']] + f"[{ent['entity_group']}]" + redacted[ent['end']:]
251
+ return redacted
252
+
253
+ # Apply de-identification
254
+ redacted_text = redact_pii(text, entities)
255
+ print(redacted_text)
256
+ ```
257
+
258
+ ### Batch Processing
259
+
260
+ ```python
261
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
262
+ import torch
263
+
264
+ model_name = "openmed/OpenMed-PII-ClinicalLongformer-149M-v1"
265
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
266
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
267
+
268
+ texts = [
269
+ "Contact Dr. Jane Doe at jane.doe@hospital.org",
270
+ "Patient SSN: 987-65-4321, MRN: 12345678",
271
+ ]
272
+
273
+ inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
274
+ with torch.no_grad():
275
+ outputs = model(**inputs)
276
+ predictions = torch.argmax(outputs.logits, dim=-1)
277
+ ```
278
+
279
+ ## Training Details
280
+
281
+ ### Dataset
282
+
283
+ - **Source**: [NVIDIA Nemotron-PII](https://huggingface.co/datasets/nvidia/Nemotron-PII)
284
+ - **Format**: BIO-tagged token classification
285
+ - **Labels**: 106 total (53 entity types × 2 BIO tags + O)
286
+ - **Splits**: 50K train / 5K validation / 45K test
287
+
288
+ ### Training Configuration
289
+
290
+ - **Max Sequence Length**: 384 tokens
291
+ - **Label Strategy**: First token only (`label_all_tokens=False`)
292
+ - **Framework**: Hugging Face Transformers + Trainer API
293
+
294
+ ## Intended Use & Limitations
295
+
296
+ ### Intended Use
297
+
298
+ - **De-identification**: Automated redaction of PII in clinical notes, medical records, and documents
299
+ - **Compliance**: Supporting HIPAA, GDPR, and privacy regulation compliance
300
+ - **Data Preprocessing**: Preparing datasets for research by removing sensitive information
301
+ - **Audit Support**: Identifying PII in document collections
302
+
303
+ ### Limitations
304
+
305
+ ⚠️ **Important**: This model is intended as an **assistive tool**, not a replacement for human review.
306
+
307
+ - **False Negatives**: Some PII may not be detected; always verify critical applications
308
+ - **Context Sensitivity**: Performance may vary with domain-specific terminology
309
+ - **Challenging Categories**: `occupation`, `time`, and `sexuality` have lower F1 scores
310
+ - **Language**: Primarily trained on English text
311
+
312
+ ## Citation
313
+
314
+ ```bibtex
315
+ @misc{openmed-pii-2026,
316
+ title = {OpenMed-PII-ClinicalLongformer-149M-v1: PII Detection Model},
317
+ author = {OpenMed Science},
318
+ year = {2026},
319
+ publisher = {Hugging Face},
320
+ url = {https://huggingface.co/openmed/OpenMed-PII-ClinicalLongformer-149M-v1}
321
+ }
322
+ ```
323
+
324
+ ## Links
325
+
326
+ - **Organization**: [OpenMed](https://huggingface.co/OpenMed)
all_results.json ADDED
@@ -0,0 +1,680 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "eval_accuracy": 0.994309148903853,
4
+ "eval_f1": 0.954933898181287,
5
+ "eval_loss": 0.02355710230767727,
6
+ "eval_macro_f1": 0.9547195008322086,
7
+ "eval_per_label": {
8
+ "account_number": {
9
+ "f1": 0.9739644970414201,
10
+ "precision": 0.9716646989374262,
11
+ "recall": 0.9762752075919335,
12
+ "support": 843
13
+ },
14
+ "age": {
15
+ "f1": 0.96,
16
+ "precision": 0.9466357308584686,
17
+ "recall": 0.9737470167064439,
18
+ "support": 419
19
+ },
20
+ "api_key": {
21
+ "f1": 0.9707317073170731,
22
+ "precision": 0.9660194174757282,
23
+ "recall": 0.9754901960784313,
24
+ "support": 204
25
+ },
26
+ "bank_routing_number": {
27
+ "f1": 0.9893992932862192,
28
+ "precision": 0.9905660377358491,
29
+ "recall": 0.9882352941176471,
30
+ "support": 425
31
+ },
32
+ "biometric_identifier": {
33
+ "f1": 0.9944341372912802,
34
+ "precision": 0.9907578558225508,
35
+ "recall": 0.9981378026070763,
36
+ "support": 537
37
+ },
38
+ "blood_type": {
39
+ "f1": 0.9438943894389439,
40
+ "precision": 0.9315960912052117,
41
+ "recall": 0.9565217391304348,
42
+ "support": 299
43
+ },
44
+ "certificate_license_number": {
45
+ "f1": 0.9888475836431226,
46
+ "precision": 0.9851851851851852,
47
+ "recall": 0.9925373134328358,
48
+ "support": 268
49
+ },
50
+ "city": {
51
+ "f1": 0.9654754694124772,
52
+ "precision": 0.9660606060606061,
53
+ "recall": 0.9648910411622276,
54
+ "support": 826
55
+ },
56
+ "company_name": {
57
+ "f1": 0.9595003785011356,
58
+ "precision": 0.9395848776871757,
59
+ "recall": 0.9802784222737819,
60
+ "support": 2586
61
+ },
62
+ "coordinate": {
63
+ "f1": 0.9845360824742269,
64
+ "precision": 0.9769820971867008,
65
+ "recall": 0.9922077922077922,
66
+ "support": 385
67
+ },
68
+ "country": {
69
+ "f1": 0.981132075471698,
70
+ "precision": 0.9774436090225563,
71
+ "recall": 0.9848484848484849,
72
+ "support": 924
73
+ },
74
+ "county": {
75
+ "f1": 0.9592233009708737,
76
+ "precision": 0.9610894941634242,
77
+ "recall": 0.9573643410852714,
78
+ "support": 774
79
+ },
80
+ "credit_debit_card": {
81
+ "f1": 0.9914529914529914,
82
+ "precision": 0.9953198127925117,
83
+ "recall": 0.9876160990712074,
84
+ "support": 646
85
+ },
86
+ "customer_id": {
87
+ "f1": 0.9698349459305634,
88
+ "precision": 0.9583802024746907,
89
+ "recall": 0.9815668202764977,
90
+ "support": 868
91
+ },
92
+ "cvv": {
93
+ "f1": 0.9593810444874276,
94
+ "precision": 0.9612403100775194,
95
+ "recall": 0.9575289575289575,
96
+ "support": 259
97
+ },
98
+ "date": {
99
+ "f1": 0.9579960287154422,
100
+ "precision": 0.9655172413793104,
101
+ "recall": 0.9505910882085481,
102
+ "support": 3299
103
+ },
104
+ "date_of_birth": {
105
+ "f1": 0.9861212563915267,
106
+ "precision": 0.9796806966618288,
107
+ "recall": 0.9926470588235294,
108
+ "support": 680
109
+ },
110
+ "date_time": {
111
+ "f1": 0.9541984732824427,
112
+ "precision": 0.9523809523809523,
113
+ "recall": 0.9560229445506692,
114
+ "support": 523
115
+ },
116
+ "device_identifier": {
117
+ "f1": 0.9603174603174602,
118
+ "precision": 0.952755905511811,
119
+ "recall": 0.968,
120
+ "support": 125
121
+ },
122
+ "education_level": {
123
+ "f1": 0.8952007835455434,
124
+ "precision": 0.904950495049505,
125
+ "recall": 0.8856589147286822,
126
+ "support": 516
127
+ },
128
+ "email": {
129
+ "f1": 0.994049035943823,
130
+ "precision": 0.9938124702522608,
131
+ "recall": 0.9942857142857143,
132
+ "support": 2100
133
+ },
134
+ "employee_id": {
135
+ "f1": 0.9850107066381156,
136
+ "precision": 0.9745762711864406,
137
+ "recall": 0.9956709956709957,
138
+ "support": 462
139
+ },
140
+ "employment_status": {
141
+ "f1": 0.9467140319715808,
142
+ "precision": 0.9467140319715808,
143
+ "recall": 0.9467140319715808,
144
+ "support": 563
145
+ },
146
+ "fax_number": {
147
+ "f1": 0.864406779661017,
148
+ "precision": 0.8557046979865772,
149
+ "recall": 0.8732876712328768,
150
+ "support": 292
151
+ },
152
+ "first_name": {
153
+ "f1": 0.9934272300469483,
154
+ "precision": 0.9931940858953298,
155
+ "recall": 0.9936604836816154,
156
+ "support": 4259
157
+ },
158
+ "gender": {
159
+ "f1": 0.9658536585365853,
160
+ "precision": 0.9519230769230769,
161
+ "recall": 0.9801980198019802,
162
+ "support": 404
163
+ },
164
+ "health_plan_beneficiary_number": {
165
+ "f1": 0.995159728944821,
166
+ "precision": 0.9980582524271845,
167
+ "recall": 0.9922779922779923,
168
+ "support": 518
169
+ },
170
+ "http_cookie": {
171
+ "f1": 0.8944099378881988,
172
+ "precision": 0.8571428571428571,
173
+ "recall": 0.935064935064935,
174
+ "support": 231
175
+ },
176
+ "ipv4": {
177
+ "f1": 0.9928571428571428,
178
+ "precision": 0.9893238434163701,
179
+ "recall": 0.996415770609319,
180
+ "support": 279
181
+ },
182
+ "ipv6": {
183
+ "f1": 0.9829351535836177,
184
+ "precision": 0.9795918367346939,
185
+ "recall": 0.9863013698630136,
186
+ "support": 146
187
+ },
188
+ "language": {
189
+ "f1": 0.9665211062590976,
190
+ "precision": 0.9880952380952381,
191
+ "recall": 0.9458689458689459,
192
+ "support": 351
193
+ },
194
+ "last_name": {
195
+ "f1": 0.9916794022754287,
196
+ "precision": 0.9901661580196677,
197
+ "recall": 0.9931972789115646,
198
+ "support": 2940
199
+ },
200
+ "license_plate": {
201
+ "f1": 0.9957081545064378,
202
+ "precision": 0.9957081545064378,
203
+ "recall": 0.9957081545064378,
204
+ "support": 233
205
+ },
206
+ "mac_address": {
207
+ "f1": 0.9932279909706545,
208
+ "precision": 0.990990990990991,
209
+ "recall": 0.995475113122172,
210
+ "support": 221
211
+ },
212
+ "medical_record_number": {
213
+ "f1": 0.9863672814755413,
214
+ "precision": 0.9887459807073955,
215
+ "recall": 0.984,
216
+ "support": 625
217
+ },
218
+ "occupation": {
219
+ "f1": 0.6761051719156312,
220
+ "precision": 0.7490396927016645,
221
+ "recall": 0.6161137440758294,
222
+ "support": 1899
223
+ },
224
+ "password": {
225
+ "f1": 0.9901269393511989,
226
+ "precision": 0.9887323943661972,
227
+ "recall": 0.9915254237288136,
228
+ "support": 354
229
+ },
230
+ "phone_number": {
231
+ "f1": 0.9637488947833776,
232
+ "precision": 0.9569798068481123,
233
+ "recall": 0.9706144256455922,
234
+ "support": 1123
235
+ },
236
+ "pin": {
237
+ "f1": 0.9561128526645768,
238
+ "precision": 0.9744408945686901,
239
+ "recall": 0.9384615384615385,
240
+ "support": 325
241
+ },
242
+ "political_view": {
243
+ "f1": 0.8720720720720719,
244
+ "precision": 0.8491228070175438,
245
+ "recall": 0.8962962962962963,
246
+ "support": 270
247
+ },
248
+ "postcode": {
249
+ "f1": 0.9807162534435261,
250
+ "precision": 0.9834254143646409,
251
+ "recall": 0.978021978021978,
252
+ "support": 364
253
+ },
254
+ "race_ethnicity": {
255
+ "f1": 0.9590062111801243,
256
+ "precision": 0.9278846153846154,
257
+ "recall": 0.9922879177377892,
258
+ "support": 389
259
+ },
260
+ "religious_belief": {
261
+ "f1": 0.920684292379471,
262
+ "precision": 0.9135802469135802,
263
+ "recall": 0.9278996865203761,
264
+ "support": 319
265
+ },
266
+ "sexuality": {
267
+ "f1": 0.907563025210084,
268
+ "precision": 0.8901098901098901,
269
+ "recall": 0.9257142857142857,
270
+ "support": 175
271
+ },
272
+ "ssn": {
273
+ "f1": 0.9908972691807542,
274
+ "precision": 0.9896103896103896,
275
+ "recall": 0.9921875,
276
+ "support": 384
277
+ },
278
+ "state": {
279
+ "f1": 0.9619732785200411,
280
+ "precision": 0.9659442724458205,
281
+ "recall": 0.9580348004094166,
282
+ "support": 977
283
+ },
284
+ "street_address": {
285
+ "f1": 0.9828282828282829,
286
+ "precision": 0.9808467741935484,
287
+ "recall": 0.9848178137651822,
288
+ "support": 988
289
+ },
290
+ "swift_bic": {
291
+ "f1": 0.9788732394366197,
292
+ "precision": 0.9652777777777778,
293
+ "recall": 0.9928571428571429,
294
+ "support": 280
295
+ },
296
+ "tax_id": {
297
+ "f1": 0.9710144927536231,
298
+ "precision": 0.9436619718309859,
299
+ "recall": 1.0,
300
+ "support": 67
301
+ },
302
+ "time": {
303
+ "f1": 0.8011272141706924,
304
+ "precision": 0.8584987057808455,
305
+ "recall": 0.7509433962264151,
306
+ "support": 1325
307
+ },
308
+ "unique_id": {
309
+ "f1": 0.8139534883720929,
310
+ "precision": 0.8860759493670886,
311
+ "recall": 0.7526881720430108,
312
+ "support": 93
313
+ },
314
+ "url": {
315
+ "f1": 0.9817797729073144,
316
+ "precision": 0.9779063650710152,
317
+ "recall": 0.9856839872746553,
318
+ "support": 1886
319
+ },
320
+ "user_name": {
321
+ "f1": 0.9780521262002744,
322
+ "precision": 0.9861687413554634,
323
+ "recall": 0.9700680272108844,
324
+ "support": 735
325
+ },
326
+ "vehicle_identifier": {
327
+ "f1": 0.9742489270386265,
328
+ "precision": 0.9659574468085106,
329
+ "recall": 0.9826839826839827,
330
+ "support": 231
331
+ }
332
+ },
333
+ "eval_precision": 0.9582233948988567,
334
+ "eval_recall": 0.9516669093026642,
335
+ "eval_runtime": 34.2316,
336
+ "eval_samples_per_second": 146.064,
337
+ "eval_steps_per_second": 2.308,
338
+ "eval_weighted_f1": 0.9534616997905814,
339
+ "test_accuracy": 0.9943192201114947,
340
+ "test_f1": 0.955778042558636,
341
+ "test_loss": 0.023028379306197166,
342
+ "test_macro_f1": 0.9571182705816211,
343
+ "test_per_label": {
344
+ "account_number": {
345
+ "f1": 0.9722450527913006,
346
+ "precision": 0.9734143562476263,
347
+ "recall": 0.9710785551907047,
348
+ "support": 7918
349
+ },
350
+ "age": {
351
+ "f1": 0.9595101224693826,
352
+ "precision": 0.9490729295426452,
353
+ "recall": 0.9701794288602477,
354
+ "support": 3957
355
+ },
356
+ "api_key": {
357
+ "f1": 0.9694142042509072,
358
+ "precision": 0.9709241952232607,
359
+ "recall": 0.9679089026915114,
360
+ "support": 1932
361
+ },
362
+ "bank_routing_number": {
363
+ "f1": 0.9829792848660772,
364
+ "precision": 0.9872780280943546,
365
+ "recall": 0.9787178139779296,
366
+ "support": 3806
367
+ },
368
+ "biometric_identifier": {
369
+ "f1": 0.9908826770864643,
370
+ "precision": 0.9864352683024137,
371
+ "recall": 0.9953703703703703,
372
+ "support": 4968
373
+ },
374
+ "blood_type": {
375
+ "f1": 0.9675413022351798,
376
+ "precision": 0.9551036070606294,
377
+ "recall": 0.9803072075620323,
378
+ "support": 2539
379
+ },
380
+ "certificate_license_number": {
381
+ "f1": 0.9729729729729729,
382
+ "precision": 0.9673258813413586,
383
+ "recall": 0.97868638538495,
384
+ "support": 2299
385
+ },
386
+ "city": {
387
+ "f1": 0.9711004306847263,
388
+ "precision": 0.974445697106351,
389
+ "recall": 0.9677780542423489,
390
+ "support": 8038
391
+ },
392
+ "company_name": {
393
+ "f1": 0.9649876747889443,
394
+ "precision": 0.9479278275403934,
395
+ "recall": 0.9826728274391328,
396
+ "support": 22508
397
+ },
398
+ "coordinate": {
399
+ "f1": 0.9934754240974337,
400
+ "precision": 0.9910326873011281,
401
+ "recall": 0.9959302325581395,
402
+ "support": 3440
403
+ },
404
+ "country": {
405
+ "f1": 0.983566710700132,
406
+ "precision": 0.9812335266209805,
407
+ "recall": 0.9859110169491525,
408
+ "support": 9440
409
+ },
410
+ "county": {
411
+ "f1": 0.9633140972794724,
412
+ "precision": 0.9630494505494506,
413
+ "recall": 0.9635788894997251,
414
+ "support": 7276
415
+ },
416
+ "credit_debit_card": {
417
+ "f1": 0.9889901943918803,
418
+ "precision": 0.9844178082191781,
419
+ "recall": 0.9936052540615278,
420
+ "support": 5786
421
+ },
422
+ "customer_id": {
423
+ "f1": 0.9751981590386091,
424
+ "precision": 0.9645928174001012,
425
+ "recall": 0.9860392967942089,
426
+ "support": 7736
427
+ },
428
+ "cvv": {
429
+ "f1": 0.9650256181777679,
430
+ "precision": 0.9462647444298821,
431
+ "recall": 0.9845454545454545,
432
+ "support": 2200
433
+ },
434
+ "date": {
435
+ "f1": 0.9536449949078485,
436
+ "precision": 0.9578301326470006,
437
+ "recall": 0.9494962710977365,
438
+ "support": 30572
439
+ },
440
+ "date_of_birth": {
441
+ "f1": 0.9888132676210591,
442
+ "precision": 0.9816713264989128,
443
+ "recall": 0.996059889676911,
444
+ "support": 6345
445
+ },
446
+ "date_time": {
447
+ "f1": 0.9482934804823216,
448
+ "precision": 0.9519901518260155,
449
+ "recall": 0.9446254071661238,
450
+ "support": 4912
451
+ },
452
+ "device_identifier": {
453
+ "f1": 0.9511228533685602,
454
+ "precision": 0.9440559440559441,
455
+ "recall": 0.9582963620230701,
456
+ "support": 1127
457
+ },
458
+ "education_level": {
459
+ "f1": 0.8892338396718041,
460
+ "precision": 0.9126081019572144,
461
+ "recall": 0.867027027027027,
462
+ "support": 4625
463
+ },
464
+ "email": {
465
+ "f1": 0.9941851327989941,
466
+ "precision": 0.995697796432319,
467
+ "recall": 0.9926770582696934,
468
+ "support": 19118
469
+ },
470
+ "employee_id": {
471
+ "f1": 0.9811648079306072,
472
+ "precision": 0.9770483711747285,
473
+ "recall": 0.9853160776505724,
474
+ "support": 4018
475
+ },
476
+ "employment_status": {
477
+ "f1": 0.9560728547774423,
478
+ "precision": 0.9473074696004632,
479
+ "recall": 0.9650019661816752,
480
+ "support": 5086
481
+ },
482
+ "fax_number": {
483
+ "f1": 0.8656716417910448,
484
+ "precision": 0.8517041334300217,
485
+ "recall": 0.8801049082053204,
486
+ "support": 2669
487
+ },
488
+ "first_name": {
489
+ "f1": 0.99127445107181,
490
+ "precision": 0.9907067551737603,
491
+ "recall": 0.9918427979463658,
492
+ "support": 38371
493
+ },
494
+ "gender": {
495
+ "f1": 0.9452845751993402,
496
+ "precision": 0.9416598192276089,
497
+ "recall": 0.9489373447419266,
498
+ "support": 3623
499
+ },
500
+ "health_plan_beneficiary_number": {
501
+ "f1": 0.9898369680287953,
502
+ "precision": 0.9915164369034994,
503
+ "recall": 0.9881631790319172,
504
+ "support": 4731
505
+ },
506
+ "http_cookie": {
507
+ "f1": 0.9451417945141795,
508
+ "precision": 0.9364348226623675,
509
+ "recall": 0.9540122008446739,
510
+ "support": 2131
511
+ },
512
+ "ipv4": {
513
+ "f1": 0.9872049017841051,
514
+ "precision": 0.9820724273933309,
515
+ "recall": 0.9923913043478261,
516
+ "support": 2760
517
+ },
518
+ "ipv6": {
519
+ "f1": 0.9886685552407931,
520
+ "precision": 0.9837914023960536,
521
+ "recall": 0.993594306049822,
522
+ "support": 1405
523
+ },
524
+ "language": {
525
+ "f1": 0.9697331146563292,
526
+ "precision": 0.9801084990958409,
527
+ "recall": 0.9595750958984951,
528
+ "support": 3389
529
+ },
530
+ "last_name": {
531
+ "f1": 0.9907705032049475,
532
+ "precision": 0.9910126725078028,
533
+ "recall": 0.9905284522288206,
534
+ "support": 26606
535
+ },
536
+ "license_plate": {
537
+ "f1": 0.9879454926624738,
538
+ "precision": 0.985363303711448,
539
+ "recall": 0.9905412506568576,
540
+ "support": 1903
541
+ },
542
+ "mac_address": {
543
+ "f1": 0.9933444259567388,
544
+ "precision": 0.9927937915742794,
545
+ "recall": 0.9938956714761377,
546
+ "support": 1802
547
+ },
548
+ "medical_record_number": {
549
+ "f1": 0.988765943069516,
550
+ "precision": 0.9863498483316482,
551
+ "recall": 0.9911939034716342,
552
+ "support": 5905
553
+ },
554
+ "occupation": {
555
+ "f1": 0.6626936089642115,
556
+ "precision": 0.7378862238197277,
557
+ "recall": 0.6014084507042253,
558
+ "support": 17750
559
+ },
560
+ "password": {
561
+ "f1": 0.9699709396189861,
562
+ "precision": 0.9807378387202089,
563
+ "recall": 0.9594378792717981,
564
+ "support": 3131
565
+ },
566
+ "phone_number": {
567
+ "f1": 0.9617778642287492,
568
+ "precision": 0.9554589371980676,
569
+ "recall": 0.9681809281378501,
570
+ "support": 10214
571
+ },
572
+ "pin": {
573
+ "f1": 0.9492007104795738,
574
+ "precision": 0.9604601006470166,
575
+ "recall": 0.9382022471910112,
576
+ "support": 2848
577
+ },
578
+ "political_view": {
579
+ "f1": 0.9084634943697784,
580
+ "precision": 0.8687044112539076,
581
+ "recall": 0.9520365435858393,
582
+ "support": 2627
583
+ },
584
+ "postcode": {
585
+ "f1": 0.9792821709950393,
586
+ "precision": 0.9741654571843251,
587
+ "recall": 0.9844529187444998,
588
+ "support": 3409
589
+ },
590
+ "race_ethnicity": {
591
+ "f1": 0.9791855203619909,
592
+ "precision": 0.9725218284540318,
593
+ "recall": 0.985941161155949,
594
+ "support": 3841
595
+ },
596
+ "religious_belief": {
597
+ "f1": 0.9436468054558507,
598
+ "precision": 0.9312787814381863,
599
+ "recall": 0.9563477628228446,
600
+ "support": 2749
601
+ },
602
+ "sexuality": {
603
+ "f1": 0.9013282732447818,
604
+ "precision": 0.847205707491082,
605
+ "recall": 0.9628378378378378,
606
+ "support": 1480
607
+ },
608
+ "ssn": {
609
+ "f1": 0.9908412005072565,
610
+ "precision": 0.9884734326679786,
611
+ "recall": 0.9932203389830508,
612
+ "support": 3540
613
+ },
614
+ "state": {
615
+ "f1": 0.9617844170460941,
616
+ "precision": 0.9640036919290329,
617
+ "recall": 0.9595753368721928,
618
+ "support": 9796
619
+ },
620
+ "street_address": {
621
+ "f1": 0.9856871295985687,
622
+ "precision": 0.9870115328630612,
623
+ "recall": 0.9843662758235623,
624
+ "support": 8955
625
+ },
626
+ "swift_bic": {
627
+ "f1": 0.9711882229232387,
628
+ "precision": 0.9447626841243862,
629
+ "recall": 0.9991345737775854,
630
+ "support": 2311
631
+ },
632
+ "tax_id": {
633
+ "f1": 0.9627659574468085,
634
+ "precision": 0.9443478260869566,
635
+ "recall": 0.9819168173598554,
636
+ "support": 553
637
+ },
638
+ "time": {
639
+ "f1": 0.8379063869033895,
640
+ "precision": 0.8678830722200993,
641
+ "recall": 0.8099313541945262,
642
+ "support": 11217
643
+ },
644
+ "unique_id": {
645
+ "f1": 0.8542329726288989,
646
+ "precision": 0.8440251572327044,
647
+ "recall": 0.8646907216494846,
648
+ "support": 776
649
+ },
650
+ "url": {
651
+ "f1": 0.9818543799772469,
652
+ "precision": 0.9775172726243062,
653
+ "recall": 0.9862301451262713,
654
+ "support": 17502
655
+ },
656
+ "user_name": {
657
+ "f1": 0.9719441916894146,
658
+ "precision": 0.9811696264543784,
659
+ "recall": 0.962890625,
660
+ "support": 6656
661
+ },
662
+ "vehicle_identifier": {
663
+ "f1": 0.9832548403976975,
664
+ "precision": 0.9766112266112266,
665
+ "recall": 0.9899894625922023,
666
+ "support": 1898
667
+ }
668
+ },
669
+ "test_precision": 0.9584375801898897,
670
+ "test_recall": 0.9531332238153719,
671
+ "test_runtime": 533.5026,
672
+ "test_samples_per_second": 84.348,
673
+ "test_steps_per_second": 1.32,
674
+ "test_weighted_f1": 0.9543696825710145,
675
+ "total_flos": 1.5335158946125824e+16,
676
+ "train_loss": 0.1346272580018119,
677
+ "train_runtime": 1333.6636,
678
+ "train_samples_per_second": 75.581,
679
+ "train_steps_per_second": 2.362
680
+ }
config.json ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LongformerForTokenClassification"
4
+ ],
5
+ "attention_mode": "longformer",
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "attention_window": [
8
+ 512,
9
+ 512,
10
+ 512,
11
+ 512,
12
+ 512,
13
+ 512,
14
+ 512,
15
+ 512,
16
+ 512,
17
+ 512,
18
+ 512,
19
+ 512
20
+ ],
21
+ "bos_token_id": 0,
22
+ "dtype": "float32",
23
+ "eos_token_id": 2,
24
+ "gradient_checkpointing": false,
25
+ "hidden_act": "gelu",
26
+ "hidden_dropout_prob": 0.1,
27
+ "hidden_size": 768,
28
+ "id2label": {
29
+ "0": "O",
30
+ "1": "B-account_number",
31
+ "2": "B-age",
32
+ "3": "B-api_key",
33
+ "4": "B-bank_routing_number",
34
+ "5": "B-biometric_identifier",
35
+ "6": "B-blood_type",
36
+ "7": "B-certificate_license_number",
37
+ "8": "B-city",
38
+ "9": "B-company_name",
39
+ "10": "B-coordinate",
40
+ "11": "B-country",
41
+ "12": "B-county",
42
+ "13": "B-credit_debit_card",
43
+ "14": "B-customer_id",
44
+ "15": "B-cvv",
45
+ "16": "B-date",
46
+ "17": "B-date_of_birth",
47
+ "18": "B-date_time",
48
+ "19": "B-device_identifier",
49
+ "20": "B-education_level",
50
+ "21": "B-email",
51
+ "22": "B-employee_id",
52
+ "23": "B-employment_status",
53
+ "24": "B-fax_number",
54
+ "25": "B-first_name",
55
+ "26": "B-gender",
56
+ "27": "B-health_plan_beneficiary_number",
57
+ "28": "B-http_cookie",
58
+ "29": "B-ipv4",
59
+ "30": "B-ipv6",
60
+ "31": "B-language",
61
+ "32": "B-last_name",
62
+ "33": "B-license_plate",
63
+ "34": "B-mac_address",
64
+ "35": "B-medical_record_number",
65
+ "36": "B-occupation",
66
+ "37": "B-password",
67
+ "38": "B-phone_number",
68
+ "39": "B-pin",
69
+ "40": "B-political_view",
70
+ "41": "B-postcode",
71
+ "42": "B-race_ethnicity",
72
+ "43": "B-religious_belief",
73
+ "44": "B-sexuality",
74
+ "45": "B-ssn",
75
+ "46": "B-state",
76
+ "47": "B-street_address",
77
+ "48": "B-swift_bic",
78
+ "49": "B-tax_id",
79
+ "50": "B-time",
80
+ "51": "B-unique_id",
81
+ "52": "B-url",
82
+ "53": "B-user_name",
83
+ "54": "B-vehicle_identifier",
84
+ "55": "I-account_number",
85
+ "56": "I-api_key",
86
+ "57": "I-biometric_identifier",
87
+ "58": "I-blood_type",
88
+ "59": "I-certificate_license_number",
89
+ "60": "I-city",
90
+ "61": "I-company_name",
91
+ "62": "I-coordinate",
92
+ "63": "I-country",
93
+ "64": "I-county",
94
+ "65": "I-credit_debit_card",
95
+ "66": "I-customer_id",
96
+ "67": "I-date",
97
+ "68": "I-date_of_birth",
98
+ "69": "I-date_time",
99
+ "70": "I-device_identifier",
100
+ "71": "I-education_level",
101
+ "72": "I-email",
102
+ "73": "I-employee_id",
103
+ "74": "I-employment_status",
104
+ "75": "I-fax_number",
105
+ "76": "I-first_name",
106
+ "77": "I-gender",
107
+ "78": "I-health_plan_beneficiary_number",
108
+ "79": "I-http_cookie",
109
+ "80": "I-ipv4",
110
+ "81": "I-ipv6",
111
+ "82": "I-language",
112
+ "83": "I-last_name",
113
+ "84": "I-license_plate",
114
+ "85": "I-mac_address",
115
+ "86": "I-medical_record_number",
116
+ "87": "I-occupation",
117
+ "88": "I-password",
118
+ "89": "I-phone_number",
119
+ "90": "I-pin",
120
+ "91": "I-political_view",
121
+ "92": "I-postcode",
122
+ "93": "I-race_ethnicity",
123
+ "94": "I-religious_belief",
124
+ "95": "I-sexuality",
125
+ "96": "I-ssn",
126
+ "97": "I-state",
127
+ "98": "I-street_address",
128
+ "99": "I-swift_bic",
129
+ "100": "I-tax_id",
130
+ "101": "I-time",
131
+ "102": "I-unique_id",
132
+ "103": "I-url",
133
+ "104": "I-user_name",
134
+ "105": "I-vehicle_identifier"
135
+ },
136
+ "ignore_attention_mask": false,
137
+ "initializer_range": 0.02,
138
+ "intermediate_size": 3072,
139
+ "label2id": {
140
+ "B-account_number": 1,
141
+ "B-age": 2,
142
+ "B-api_key": 3,
143
+ "B-bank_routing_number": 4,
144
+ "B-biometric_identifier": 5,
145
+ "B-blood_type": 6,
146
+ "B-certificate_license_number": 7,
147
+ "B-city": 8,
148
+ "B-company_name": 9,
149
+ "B-coordinate": 10,
150
+ "B-country": 11,
151
+ "B-county": 12,
152
+ "B-credit_debit_card": 13,
153
+ "B-customer_id": 14,
154
+ "B-cvv": 15,
155
+ "B-date": 16,
156
+ "B-date_of_birth": 17,
157
+ "B-date_time": 18,
158
+ "B-device_identifier": 19,
159
+ "B-education_level": 20,
160
+ "B-email": 21,
161
+ "B-employee_id": 22,
162
+ "B-employment_status": 23,
163
+ "B-fax_number": 24,
164
+ "B-first_name": 25,
165
+ "B-gender": 26,
166
+ "B-health_plan_beneficiary_number": 27,
167
+ "B-http_cookie": 28,
168
+ "B-ipv4": 29,
169
+ "B-ipv6": 30,
170
+ "B-language": 31,
171
+ "B-last_name": 32,
172
+ "B-license_plate": 33,
173
+ "B-mac_address": 34,
174
+ "B-medical_record_number": 35,
175
+ "B-occupation": 36,
176
+ "B-password": 37,
177
+ "B-phone_number": 38,
178
+ "B-pin": 39,
179
+ "B-political_view": 40,
180
+ "B-postcode": 41,
181
+ "B-race_ethnicity": 42,
182
+ "B-religious_belief": 43,
183
+ "B-sexuality": 44,
184
+ "B-ssn": 45,
185
+ "B-state": 46,
186
+ "B-street_address": 47,
187
+ "B-swift_bic": 48,
188
+ "B-tax_id": 49,
189
+ "B-time": 50,
190
+ "B-unique_id": 51,
191
+ "B-url": 52,
192
+ "B-user_name": 53,
193
+ "B-vehicle_identifier": 54,
194
+ "I-account_number": 55,
195
+ "I-api_key": 56,
196
+ "I-biometric_identifier": 57,
197
+ "I-blood_type": 58,
198
+ "I-certificate_license_number": 59,
199
+ "I-city": 60,
200
+ "I-company_name": 61,
201
+ "I-coordinate": 62,
202
+ "I-country": 63,
203
+ "I-county": 64,
204
+ "I-credit_debit_card": 65,
205
+ "I-customer_id": 66,
206
+ "I-date": 67,
207
+ "I-date_of_birth": 68,
208
+ "I-date_time": 69,
209
+ "I-device_identifier": 70,
210
+ "I-education_level": 71,
211
+ "I-email": 72,
212
+ "I-employee_id": 73,
213
+ "I-employment_status": 74,
214
+ "I-fax_number": 75,
215
+ "I-first_name": 76,
216
+ "I-gender": 77,
217
+ "I-health_plan_beneficiary_number": 78,
218
+ "I-http_cookie": 79,
219
+ "I-ipv4": 80,
220
+ "I-ipv6": 81,
221
+ "I-language": 82,
222
+ "I-last_name": 83,
223
+ "I-license_plate": 84,
224
+ "I-mac_address": 85,
225
+ "I-medical_record_number": 86,
226
+ "I-occupation": 87,
227
+ "I-password": 88,
228
+ "I-phone_number": 89,
229
+ "I-pin": 90,
230
+ "I-political_view": 91,
231
+ "I-postcode": 92,
232
+ "I-race_ethnicity": 93,
233
+ "I-religious_belief": 94,
234
+ "I-sexuality": 95,
235
+ "I-ssn": 96,
236
+ "I-state": 97,
237
+ "I-street_address": 98,
238
+ "I-swift_bic": 99,
239
+ "I-tax_id": 100,
240
+ "I-time": 101,
241
+ "I-unique_id": 102,
242
+ "I-url": 103,
243
+ "I-user_name": 104,
244
+ "I-vehicle_identifier": 105,
245
+ "O": 0
246
+ },
247
+ "layer_norm_eps": 1e-05,
248
+ "max_position_embeddings": 4098,
249
+ "model_type": "longformer",
250
+ "num_attention_heads": 12,
251
+ "num_hidden_layers": 12,
252
+ "onnx_export": false,
253
+ "pad_token_id": 1,
254
+ "position_embedding_type": "absolute",
255
+ "sep_token_id": 2,
256
+ "transformers_version": "4.57.3",
257
+ "type_vocab_size": 1,
258
+ "use_cache": true,
259
+ "vocab_size": 50265
260
+ }
eval_results.json ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "eval_accuracy": 0.994309148903853,
4
+ "eval_f1": 0.954933898181287,
5
+ "eval_loss": 0.02355710230767727,
6
+ "eval_macro_f1": 0.9547195008322086,
7
+ "eval_per_label": {
8
+ "account_number": {
9
+ "f1": 0.9739644970414201,
10
+ "precision": 0.9716646989374262,
11
+ "recall": 0.9762752075919335,
12
+ "support": 843
13
+ },
14
+ "age": {
15
+ "f1": 0.96,
16
+ "precision": 0.9466357308584686,
17
+ "recall": 0.9737470167064439,
18
+ "support": 419
19
+ },
20
+ "api_key": {
21
+ "f1": 0.9707317073170731,
22
+ "precision": 0.9660194174757282,
23
+ "recall": 0.9754901960784313,
24
+ "support": 204
25
+ },
26
+ "bank_routing_number": {
27
+ "f1": 0.9893992932862192,
28
+ "precision": 0.9905660377358491,
29
+ "recall": 0.9882352941176471,
30
+ "support": 425
31
+ },
32
+ "biometric_identifier": {
33
+ "f1": 0.9944341372912802,
34
+ "precision": 0.9907578558225508,
35
+ "recall": 0.9981378026070763,
36
+ "support": 537
37
+ },
38
+ "blood_type": {
39
+ "f1": 0.9438943894389439,
40
+ "precision": 0.9315960912052117,
41
+ "recall": 0.9565217391304348,
42
+ "support": 299
43
+ },
44
+ "certificate_license_number": {
45
+ "f1": 0.9888475836431226,
46
+ "precision": 0.9851851851851852,
47
+ "recall": 0.9925373134328358,
48
+ "support": 268
49
+ },
50
+ "city": {
51
+ "f1": 0.9654754694124772,
52
+ "precision": 0.9660606060606061,
53
+ "recall": 0.9648910411622276,
54
+ "support": 826
55
+ },
56
+ "company_name": {
57
+ "f1": 0.9595003785011356,
58
+ "precision": 0.9395848776871757,
59
+ "recall": 0.9802784222737819,
60
+ "support": 2586
61
+ },
62
+ "coordinate": {
63
+ "f1": 0.9845360824742269,
64
+ "precision": 0.9769820971867008,
65
+ "recall": 0.9922077922077922,
66
+ "support": 385
67
+ },
68
+ "country": {
69
+ "f1": 0.981132075471698,
70
+ "precision": 0.9774436090225563,
71
+ "recall": 0.9848484848484849,
72
+ "support": 924
73
+ },
74
+ "county": {
75
+ "f1": 0.9592233009708737,
76
+ "precision": 0.9610894941634242,
77
+ "recall": 0.9573643410852714,
78
+ "support": 774
79
+ },
80
+ "credit_debit_card": {
81
+ "f1": 0.9914529914529914,
82
+ "precision": 0.9953198127925117,
83
+ "recall": 0.9876160990712074,
84
+ "support": 646
85
+ },
86
+ "customer_id": {
87
+ "f1": 0.9698349459305634,
88
+ "precision": 0.9583802024746907,
89
+ "recall": 0.9815668202764977,
90
+ "support": 868
91
+ },
92
+ "cvv": {
93
+ "f1": 0.9593810444874276,
94
+ "precision": 0.9612403100775194,
95
+ "recall": 0.9575289575289575,
96
+ "support": 259
97
+ },
98
+ "date": {
99
+ "f1": 0.9579960287154422,
100
+ "precision": 0.9655172413793104,
101
+ "recall": 0.9505910882085481,
102
+ "support": 3299
103
+ },
104
+ "date_of_birth": {
105
+ "f1": 0.9861212563915267,
106
+ "precision": 0.9796806966618288,
107
+ "recall": 0.9926470588235294,
108
+ "support": 680
109
+ },
110
+ "date_time": {
111
+ "f1": 0.9541984732824427,
112
+ "precision": 0.9523809523809523,
113
+ "recall": 0.9560229445506692,
114
+ "support": 523
115
+ },
116
+ "device_identifier": {
117
+ "f1": 0.9603174603174602,
118
+ "precision": 0.952755905511811,
119
+ "recall": 0.968,
120
+ "support": 125
121
+ },
122
+ "education_level": {
123
+ "f1": 0.8952007835455434,
124
+ "precision": 0.904950495049505,
125
+ "recall": 0.8856589147286822,
126
+ "support": 516
127
+ },
128
+ "email": {
129
+ "f1": 0.994049035943823,
130
+ "precision": 0.9938124702522608,
131
+ "recall": 0.9942857142857143,
132
+ "support": 2100
133
+ },
134
+ "employee_id": {
135
+ "f1": 0.9850107066381156,
136
+ "precision": 0.9745762711864406,
137
+ "recall": 0.9956709956709957,
138
+ "support": 462
139
+ },
140
+ "employment_status": {
141
+ "f1": 0.9467140319715808,
142
+ "precision": 0.9467140319715808,
143
+ "recall": 0.9467140319715808,
144
+ "support": 563
145
+ },
146
+ "fax_number": {
147
+ "f1": 0.864406779661017,
148
+ "precision": 0.8557046979865772,
149
+ "recall": 0.8732876712328768,
150
+ "support": 292
151
+ },
152
+ "first_name": {
153
+ "f1": 0.9934272300469483,
154
+ "precision": 0.9931940858953298,
155
+ "recall": 0.9936604836816154,
156
+ "support": 4259
157
+ },
158
+ "gender": {
159
+ "f1": 0.9658536585365853,
160
+ "precision": 0.9519230769230769,
161
+ "recall": 0.9801980198019802,
162
+ "support": 404
163
+ },
164
+ "health_plan_beneficiary_number": {
165
+ "f1": 0.995159728944821,
166
+ "precision": 0.9980582524271845,
167
+ "recall": 0.9922779922779923,
168
+ "support": 518
169
+ },
170
+ "http_cookie": {
171
+ "f1": 0.8944099378881988,
172
+ "precision": 0.8571428571428571,
173
+ "recall": 0.935064935064935,
174
+ "support": 231
175
+ },
176
+ "ipv4": {
177
+ "f1": 0.9928571428571428,
178
+ "precision": 0.9893238434163701,
179
+ "recall": 0.996415770609319,
180
+ "support": 279
181
+ },
182
+ "ipv6": {
183
+ "f1": 0.9829351535836177,
184
+ "precision": 0.9795918367346939,
185
+ "recall": 0.9863013698630136,
186
+ "support": 146
187
+ },
188
+ "language": {
189
+ "f1": 0.9665211062590976,
190
+ "precision": 0.9880952380952381,
191
+ "recall": 0.9458689458689459,
192
+ "support": 351
193
+ },
194
+ "last_name": {
195
+ "f1": 0.9916794022754287,
196
+ "precision": 0.9901661580196677,
197
+ "recall": 0.9931972789115646,
198
+ "support": 2940
199
+ },
200
+ "license_plate": {
201
+ "f1": 0.9957081545064378,
202
+ "precision": 0.9957081545064378,
203
+ "recall": 0.9957081545064378,
204
+ "support": 233
205
+ },
206
+ "mac_address": {
207
+ "f1": 0.9932279909706545,
208
+ "precision": 0.990990990990991,
209
+ "recall": 0.995475113122172,
210
+ "support": 221
211
+ },
212
+ "medical_record_number": {
213
+ "f1": 0.9863672814755413,
214
+ "precision": 0.9887459807073955,
215
+ "recall": 0.984,
216
+ "support": 625
217
+ },
218
+ "occupation": {
219
+ "f1": 0.6761051719156312,
220
+ "precision": 0.7490396927016645,
221
+ "recall": 0.6161137440758294,
222
+ "support": 1899
223
+ },
224
+ "password": {
225
+ "f1": 0.9901269393511989,
226
+ "precision": 0.9887323943661972,
227
+ "recall": 0.9915254237288136,
228
+ "support": 354
229
+ },
230
+ "phone_number": {
231
+ "f1": 0.9637488947833776,
232
+ "precision": 0.9569798068481123,
233
+ "recall": 0.9706144256455922,
234
+ "support": 1123
235
+ },
236
+ "pin": {
237
+ "f1": 0.9561128526645768,
238
+ "precision": 0.9744408945686901,
239
+ "recall": 0.9384615384615385,
240
+ "support": 325
241
+ },
242
+ "political_view": {
243
+ "f1": 0.8720720720720719,
244
+ "precision": 0.8491228070175438,
245
+ "recall": 0.8962962962962963,
246
+ "support": 270
247
+ },
248
+ "postcode": {
249
+ "f1": 0.9807162534435261,
250
+ "precision": 0.9834254143646409,
251
+ "recall": 0.978021978021978,
252
+ "support": 364
253
+ },
254
+ "race_ethnicity": {
255
+ "f1": 0.9590062111801243,
256
+ "precision": 0.9278846153846154,
257
+ "recall": 0.9922879177377892,
258
+ "support": 389
259
+ },
260
+ "religious_belief": {
261
+ "f1": 0.920684292379471,
262
+ "precision": 0.9135802469135802,
263
+ "recall": 0.9278996865203761,
264
+ "support": 319
265
+ },
266
+ "sexuality": {
267
+ "f1": 0.907563025210084,
268
+ "precision": 0.8901098901098901,
269
+ "recall": 0.9257142857142857,
270
+ "support": 175
271
+ },
272
+ "ssn": {
273
+ "f1": 0.9908972691807542,
274
+ "precision": 0.9896103896103896,
275
+ "recall": 0.9921875,
276
+ "support": 384
277
+ },
278
+ "state": {
279
+ "f1": 0.9619732785200411,
280
+ "precision": 0.9659442724458205,
281
+ "recall": 0.9580348004094166,
282
+ "support": 977
283
+ },
284
+ "street_address": {
285
+ "f1": 0.9828282828282829,
286
+ "precision": 0.9808467741935484,
287
+ "recall": 0.9848178137651822,
288
+ "support": 988
289
+ },
290
+ "swift_bic": {
291
+ "f1": 0.9788732394366197,
292
+ "precision": 0.9652777777777778,
293
+ "recall": 0.9928571428571429,
294
+ "support": 280
295
+ },
296
+ "tax_id": {
297
+ "f1": 0.9710144927536231,
298
+ "precision": 0.9436619718309859,
299
+ "recall": 1.0,
300
+ "support": 67
301
+ },
302
+ "time": {
303
+ "f1": 0.8011272141706924,
304
+ "precision": 0.8584987057808455,
305
+ "recall": 0.7509433962264151,
306
+ "support": 1325
307
+ },
308
+ "unique_id": {
309
+ "f1": 0.8139534883720929,
310
+ "precision": 0.8860759493670886,
311
+ "recall": 0.7526881720430108,
312
+ "support": 93
313
+ },
314
+ "url": {
315
+ "f1": 0.9817797729073144,
316
+ "precision": 0.9779063650710152,
317
+ "recall": 0.9856839872746553,
318
+ "support": 1886
319
+ },
320
+ "user_name": {
321
+ "f1": 0.9780521262002744,
322
+ "precision": 0.9861687413554634,
323
+ "recall": 0.9700680272108844,
324
+ "support": 735
325
+ },
326
+ "vehicle_identifier": {
327
+ "f1": 0.9742489270386265,
328
+ "precision": 0.9659574468085106,
329
+ "recall": 0.9826839826839827,
330
+ "support": 231
331
+ }
332
+ },
333
+ "eval_precision": 0.9582233948988567,
334
+ "eval_recall": 0.9516669093026642,
335
+ "eval_runtime": 34.2316,
336
+ "eval_samples_per_second": 146.064,
337
+ "eval_steps_per_second": 2.308,
338
+ "eval_weighted_f1": 0.9534616997905814
339
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65d1523f25381e6fc4cc3233770a37772eb2e1ecf6025494f93853ff04aca9c7
3
+ size 592635520
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
test_results.json ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "test_accuracy": 0.9943192201114947,
3
+ "test_f1": 0.955778042558636,
4
+ "test_loss": 0.023028379306197166,
5
+ "test_macro_f1": 0.9571182705816211,
6
+ "test_per_label": {
7
+ "account_number": {
8
+ "f1": 0.9722450527913006,
9
+ "precision": 0.9734143562476263,
10
+ "recall": 0.9710785551907047,
11
+ "support": 7918
12
+ },
13
+ "age": {
14
+ "f1": 0.9595101224693826,
15
+ "precision": 0.9490729295426452,
16
+ "recall": 0.9701794288602477,
17
+ "support": 3957
18
+ },
19
+ "api_key": {
20
+ "f1": 0.9694142042509072,
21
+ "precision": 0.9709241952232607,
22
+ "recall": 0.9679089026915114,
23
+ "support": 1932
24
+ },
25
+ "bank_routing_number": {
26
+ "f1": 0.9829792848660772,
27
+ "precision": 0.9872780280943546,
28
+ "recall": 0.9787178139779296,
29
+ "support": 3806
30
+ },
31
+ "biometric_identifier": {
32
+ "f1": 0.9908826770864643,
33
+ "precision": 0.9864352683024137,
34
+ "recall": 0.9953703703703703,
35
+ "support": 4968
36
+ },
37
+ "blood_type": {
38
+ "f1": 0.9675413022351798,
39
+ "precision": 0.9551036070606294,
40
+ "recall": 0.9803072075620323,
41
+ "support": 2539
42
+ },
43
+ "certificate_license_number": {
44
+ "f1": 0.9729729729729729,
45
+ "precision": 0.9673258813413586,
46
+ "recall": 0.97868638538495,
47
+ "support": 2299
48
+ },
49
+ "city": {
50
+ "f1": 0.9711004306847263,
51
+ "precision": 0.974445697106351,
52
+ "recall": 0.9677780542423489,
53
+ "support": 8038
54
+ },
55
+ "company_name": {
56
+ "f1": 0.9649876747889443,
57
+ "precision": 0.9479278275403934,
58
+ "recall": 0.9826728274391328,
59
+ "support": 22508
60
+ },
61
+ "coordinate": {
62
+ "f1": 0.9934754240974337,
63
+ "precision": 0.9910326873011281,
64
+ "recall": 0.9959302325581395,
65
+ "support": 3440
66
+ },
67
+ "country": {
68
+ "f1": 0.983566710700132,
69
+ "precision": 0.9812335266209805,
70
+ "recall": 0.9859110169491525,
71
+ "support": 9440
72
+ },
73
+ "county": {
74
+ "f1": 0.9633140972794724,
75
+ "precision": 0.9630494505494506,
76
+ "recall": 0.9635788894997251,
77
+ "support": 7276
78
+ },
79
+ "credit_debit_card": {
80
+ "f1": 0.9889901943918803,
81
+ "precision": 0.9844178082191781,
82
+ "recall": 0.9936052540615278,
83
+ "support": 5786
84
+ },
85
+ "customer_id": {
86
+ "f1": 0.9751981590386091,
87
+ "precision": 0.9645928174001012,
88
+ "recall": 0.9860392967942089,
89
+ "support": 7736
90
+ },
91
+ "cvv": {
92
+ "f1": 0.9650256181777679,
93
+ "precision": 0.9462647444298821,
94
+ "recall": 0.9845454545454545,
95
+ "support": 2200
96
+ },
97
+ "date": {
98
+ "f1": 0.9536449949078485,
99
+ "precision": 0.9578301326470006,
100
+ "recall": 0.9494962710977365,
101
+ "support": 30572
102
+ },
103
+ "date_of_birth": {
104
+ "f1": 0.9888132676210591,
105
+ "precision": 0.9816713264989128,
106
+ "recall": 0.996059889676911,
107
+ "support": 6345
108
+ },
109
+ "date_time": {
110
+ "f1": 0.9482934804823216,
111
+ "precision": 0.9519901518260155,
112
+ "recall": 0.9446254071661238,
113
+ "support": 4912
114
+ },
115
+ "device_identifier": {
116
+ "f1": 0.9511228533685602,
117
+ "precision": 0.9440559440559441,
118
+ "recall": 0.9582963620230701,
119
+ "support": 1127
120
+ },
121
+ "education_level": {
122
+ "f1": 0.8892338396718041,
123
+ "precision": 0.9126081019572144,
124
+ "recall": 0.867027027027027,
125
+ "support": 4625
126
+ },
127
+ "email": {
128
+ "f1": 0.9941851327989941,
129
+ "precision": 0.995697796432319,
130
+ "recall": 0.9926770582696934,
131
+ "support": 19118
132
+ },
133
+ "employee_id": {
134
+ "f1": 0.9811648079306072,
135
+ "precision": 0.9770483711747285,
136
+ "recall": 0.9853160776505724,
137
+ "support": 4018
138
+ },
139
+ "employment_status": {
140
+ "f1": 0.9560728547774423,
141
+ "precision": 0.9473074696004632,
142
+ "recall": 0.9650019661816752,
143
+ "support": 5086
144
+ },
145
+ "fax_number": {
146
+ "f1": 0.8656716417910448,
147
+ "precision": 0.8517041334300217,
148
+ "recall": 0.8801049082053204,
149
+ "support": 2669
150
+ },
151
+ "first_name": {
152
+ "f1": 0.99127445107181,
153
+ "precision": 0.9907067551737603,
154
+ "recall": 0.9918427979463658,
155
+ "support": 38371
156
+ },
157
+ "gender": {
158
+ "f1": 0.9452845751993402,
159
+ "precision": 0.9416598192276089,
160
+ "recall": 0.9489373447419266,
161
+ "support": 3623
162
+ },
163
+ "health_plan_beneficiary_number": {
164
+ "f1": 0.9898369680287953,
165
+ "precision": 0.9915164369034994,
166
+ "recall": 0.9881631790319172,
167
+ "support": 4731
168
+ },
169
+ "http_cookie": {
170
+ "f1": 0.9451417945141795,
171
+ "precision": 0.9364348226623675,
172
+ "recall": 0.9540122008446739,
173
+ "support": 2131
174
+ },
175
+ "ipv4": {
176
+ "f1": 0.9872049017841051,
177
+ "precision": 0.9820724273933309,
178
+ "recall": 0.9923913043478261,
179
+ "support": 2760
180
+ },
181
+ "ipv6": {
182
+ "f1": 0.9886685552407931,
183
+ "precision": 0.9837914023960536,
184
+ "recall": 0.993594306049822,
185
+ "support": 1405
186
+ },
187
+ "language": {
188
+ "f1": 0.9697331146563292,
189
+ "precision": 0.9801084990958409,
190
+ "recall": 0.9595750958984951,
191
+ "support": 3389
192
+ },
193
+ "last_name": {
194
+ "f1": 0.9907705032049475,
195
+ "precision": 0.9910126725078028,
196
+ "recall": 0.9905284522288206,
197
+ "support": 26606
198
+ },
199
+ "license_plate": {
200
+ "f1": 0.9879454926624738,
201
+ "precision": 0.985363303711448,
202
+ "recall": 0.9905412506568576,
203
+ "support": 1903
204
+ },
205
+ "mac_address": {
206
+ "f1": 0.9933444259567388,
207
+ "precision": 0.9927937915742794,
208
+ "recall": 0.9938956714761377,
209
+ "support": 1802
210
+ },
211
+ "medical_record_number": {
212
+ "f1": 0.988765943069516,
213
+ "precision": 0.9863498483316482,
214
+ "recall": 0.9911939034716342,
215
+ "support": 5905
216
+ },
217
+ "occupation": {
218
+ "f1": 0.6626936089642115,
219
+ "precision": 0.7378862238197277,
220
+ "recall": 0.6014084507042253,
221
+ "support": 17750
222
+ },
223
+ "password": {
224
+ "f1": 0.9699709396189861,
225
+ "precision": 0.9807378387202089,
226
+ "recall": 0.9594378792717981,
227
+ "support": 3131
228
+ },
229
+ "phone_number": {
230
+ "f1": 0.9617778642287492,
231
+ "precision": 0.9554589371980676,
232
+ "recall": 0.9681809281378501,
233
+ "support": 10214
234
+ },
235
+ "pin": {
236
+ "f1": 0.9492007104795738,
237
+ "precision": 0.9604601006470166,
238
+ "recall": 0.9382022471910112,
239
+ "support": 2848
240
+ },
241
+ "political_view": {
242
+ "f1": 0.9084634943697784,
243
+ "precision": 0.8687044112539076,
244
+ "recall": 0.9520365435858393,
245
+ "support": 2627
246
+ },
247
+ "postcode": {
248
+ "f1": 0.9792821709950393,
249
+ "precision": 0.9741654571843251,
250
+ "recall": 0.9844529187444998,
251
+ "support": 3409
252
+ },
253
+ "race_ethnicity": {
254
+ "f1": 0.9791855203619909,
255
+ "precision": 0.9725218284540318,
256
+ "recall": 0.985941161155949,
257
+ "support": 3841
258
+ },
259
+ "religious_belief": {
260
+ "f1": 0.9436468054558507,
261
+ "precision": 0.9312787814381863,
262
+ "recall": 0.9563477628228446,
263
+ "support": 2749
264
+ },
265
+ "sexuality": {
266
+ "f1": 0.9013282732447818,
267
+ "precision": 0.847205707491082,
268
+ "recall": 0.9628378378378378,
269
+ "support": 1480
270
+ },
271
+ "ssn": {
272
+ "f1": 0.9908412005072565,
273
+ "precision": 0.9884734326679786,
274
+ "recall": 0.9932203389830508,
275
+ "support": 3540
276
+ },
277
+ "state": {
278
+ "f1": 0.9617844170460941,
279
+ "precision": 0.9640036919290329,
280
+ "recall": 0.9595753368721928,
281
+ "support": 9796
282
+ },
283
+ "street_address": {
284
+ "f1": 0.9856871295985687,
285
+ "precision": 0.9870115328630612,
286
+ "recall": 0.9843662758235623,
287
+ "support": 8955
288
+ },
289
+ "swift_bic": {
290
+ "f1": 0.9711882229232387,
291
+ "precision": 0.9447626841243862,
292
+ "recall": 0.9991345737775854,
293
+ "support": 2311
294
+ },
295
+ "tax_id": {
296
+ "f1": 0.9627659574468085,
297
+ "precision": 0.9443478260869566,
298
+ "recall": 0.9819168173598554,
299
+ "support": 553
300
+ },
301
+ "time": {
302
+ "f1": 0.8379063869033895,
303
+ "precision": 0.8678830722200993,
304
+ "recall": 0.8099313541945262,
305
+ "support": 11217
306
+ },
307
+ "unique_id": {
308
+ "f1": 0.8542329726288989,
309
+ "precision": 0.8440251572327044,
310
+ "recall": 0.8646907216494846,
311
+ "support": 776
312
+ },
313
+ "url": {
314
+ "f1": 0.9818543799772469,
315
+ "precision": 0.9775172726243062,
316
+ "recall": 0.9862301451262713,
317
+ "support": 17502
318
+ },
319
+ "user_name": {
320
+ "f1": 0.9719441916894146,
321
+ "precision": 0.9811696264543784,
322
+ "recall": 0.962890625,
323
+ "support": 6656
324
+ },
325
+ "vehicle_identifier": {
326
+ "f1": 0.9832548403976975,
327
+ "precision": 0.9766112266112266,
328
+ "recall": 0.9899894625922023,
329
+ "support": 1898
330
+ }
331
+ },
332
+ "test_precision": 0.9584375801898897,
333
+ "test_recall": 0.9531332238153719,
334
+ "test_runtime": 533.5026,
335
+ "test_samples_per_second": 84.348,
336
+ "test_steps_per_second": 1.32,
337
+ "test_weighted_f1": 0.9543696825710145
338
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "<mask>",
52
+ "model_max_length": 4096,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "tokenizer_class": "RobertaTokenizer",
56
+ "trim_offsets": true,
57
+ "unk_token": "<unk>"
58
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "total_flos": 1.5335158946125824e+16,
4
+ "train_loss": 0.1346272580018119,
5
+ "train_runtime": 1333.6636,
6
+ "train_samples_per_second": 75.581,
7
+ "train_steps_per_second": 2.362
8
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff