anhphamduy commited on
Commit
47b3421
·
verified ·
1 Parent(s): c6a1de5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Overview
2
+
3
+ `pii-phi` is a fine-tuned version of `Phi-3.5-mini-instruct` designed to extract Personally Identifiable Information (PII) from unstructured text. The model outputs PII entities in a structured JSON format according to strict schema guidelines.
4
+
5
+ ### Training Prompt Format
6
+
7
+ ```text
8
+ # GUIDELINES
9
+ - Extract all instances of the following Personally Identifiable Information (PII) entities from the provided text and return them in JSON format.
10
+ - Each item in the JSON list should include an 'entity' key specifying the type of PII and a 'value' key containing the extracted information.
11
+ - The supported entities are: PERSON_NAME, BUSINESS_NAME, API_KEY, USERNAME, API_ENDPOINT, WEBSITE_ADDRESS, PHONE_NUMBER, EMAIL_ADDRESS, ID, PASSWORD, ADDRESS.
12
+
13
+ # EXPECTED OUTPUT
14
+ - The json output must be in the format below:
15
+ {
16
+ "result": [
17
+ {"entity": "ENTITY_TYPE", "value": "EXTRACTED_VALUE"},
18
+ ...
19
+ ]
20
+ }
21
+ ```
22
+
23
+ ### Supported Entities
24
+
25
+ * PERSON\_NAME
26
+ * BUSINESS\_NAME
27
+ * API\_KEY
28
+ * USERNAME
29
+ * API\_ENDPOINT
30
+ * WEBSITE\_ADDRESS
31
+ * PHONE\_NUMBER
32
+ * EMAIL\_ADDRESS
33
+ * ID
34
+ * PASSWORD
35
+ * ADDRESS
36
+
37
+ ### Intended Use
38
+
39
+ The model is intended for PII detection in text documents to support tasks such as data anonymization, compliance, and security auditing.
40
+
41
+ ### Limitations
42
+
43
+ * Not guaranteed to detect all forms of PII in every context.
44
+ * May return false positives or omit contextually relevant information.