Minibase commited on
Commit
2374a43
·
verified ·
1 Parent(s): 697cf5d

Upload benchmarks.txt with huggingface_hub

Browse files
Files changed (1) hide show
  1. benchmarks.txt +67 -0
benchmarks.txt ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # De-identification Benchmark Results
2
+ **Model:** Minibase-DeId-Small
3
+ **Dataset:** Personal_De-identifier_Benchmark_SFT.jsonl
4
+ **Sample Size:** 100
5
+ **Date:** 2025-09-25T12:35:05.897062
6
+
7
+ ## Overall Performance
8
+
9
+ | Metric | Score | Description |
10
+ |--------|-------|-------------|
11
+ | PII Detection Rate | 0.203 | How well personal identifiers are detected |
12
+ | Completeness Score | 0.640 | Percentage of texts fully de-identified |
13
+ | Semantic Preservation | 0.109 | How well meaning is preserved |
14
+ | Average Latency | 492.4ms | Response time performance |
15
+
16
+ ## Domain Performance
17
+
18
+ ### Medical Domain (33 samples)
19
+ - PII Detection: 0.214
20
+ - Completeness: 0.606
21
+ - Semantic Preservation: 0.110
22
+
23
+ ### Legal Domain (6 samples)
24
+ - PII Detection: 0.113
25
+ - Completeness: 0.500
26
+ - Semantic Preservation: 0.056
27
+
28
+ ### Hr Domain (11 samples)
29
+ - PII Detection: 0.202
30
+ - Completeness: 0.273
31
+ - Semantic Preservation: 0.108
32
+
33
+ ### General Domain (40 samples)
34
+ - PII Detection: 0.218
35
+ - Completeness: 0.750
36
+ - Semantic Preservation: 0.120
37
+
38
+ ### Research Domain (4 samples)
39
+ - PII Detection: 0.192
40
+ - Completeness: 0.500
41
+ - Semantic Preservation: 0.108
42
+
43
+ ### Customer_Service Domain (6 samples)
44
+ - PII Detection: 0.140
45
+ - Completeness: 1.000
46
+ - Semantic Preservation: 0.083
47
+
48
+ ## Example Results
49
+
50
+ ### Example 1 (medical domain)
51
+ **Input:** Patient Sarah Johnson, DOB 05/12/1980, visited Dr. Lee at St. Jude Hospital on 2023-10-26. Her conta...
52
+ **Expected:** Patient [NAME_1], DOB [DOB_1], visited [NAME_2] at [HOSPITAL_1] on [DATE_1]. Her contact is [PHONE_1...
53
+ **Predicted:** Patient [FIRSTNAME_1] [MIDDLENAME_1], DOB [DOB_1], visited Dr. [LASTNAME_1] at [CITY_1] Hospital on ...
54
+ **PII Detection:** 0.286
55
+
56
+ ### Example 2 (legal domain)
57
+ **Input:** Deponent Mr. Robert Davis, CEO of GlobalCorp Inc., stated under oath on December 1, 2022, that his a...
58
+ **Expected:** Deponent [NAME_1], CEO of [ORGANIZATION_1], stated under oath on [DATE_1], that his attorney, [NAME_...
59
+ **Predicted:** Deponent [PREFIX_1] [FIRSTNAME_1] [LASTNAME_1], CEO of [COMPANYNAME_1], stated under oath on [DATE_1...
60
+ **PII Detection:** 0.167
61
+
62
+ ### Example 3 (hr domain)
63
+ **Input:** Employee ID: EMP-001-XYZ. Name: John Doe. Salary: $85,000. Email: john.doe@example.com. Marital Stat...
64
+ **Expected:** Employee ID: [EMPLOYEE_ID_1]. Name: [NAME_1]. Salary: [SALARY_1]. Email: [EMAIL_1]. Marital Status: ...
65
+ **Predicted:** Employee ID: EMP-[BUILDINGNUMBER_1]. Name: [FIRSTNAME_1] Doe. Salary: [CURRENCYSYMBOL_1][AMOUNT_1]. ...
66
+ **PII Detection:** 0.167
67
+