twelcone commited on
Commit
b0b221a
·
verified ·
1 Parent(s): bc74b08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -1
README.md CHANGED
@@ -1,7 +1,103 @@
1
  ---
2
- base_model: twelcone/pii-phi
3
  library_name: mlx
4
  pipeline_tag: text-generation
5
  tags:
6
  - mlx
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: twelcone/pii-phi-mlx
3
  library_name: mlx
4
  pipeline_tag: text-generation
5
  tags:
6
  - mlx
7
  ---
8
+
9
+ # Overview
10
+
11
+ `pii-phi-mlx` is a CoreML fine-tuned version of `Phi-3.5-mini-instruct` designed to extract Personally Identifiable Information (PII) from unstructured text for Mac devices. The model outputs PII entities in a structured JSON format according to strict schema guidelines.
12
+
13
+ # Training Prompt Format
14
+
15
+ ```text
16
+ # GUIDELINES
17
+ - Extract all instances of the following Personally Identifiable Information (PII) entities from the provided text and return them in JSON format.
18
+ - Each item in the JSON list should include an 'entity' key specifying the type of PII and a 'value' key containing the extracted information.
19
+ - The supported entities are: PERSON_NAME, BUSINESS_NAME, API_KEY, USERNAME, API_ENDPOINT, WEBSITE_ADDRESS, PHONE_NUMBER, EMAIL_ADDRESS, ID, PASSWORD, ADDRESS.
20
+
21
+ # EXPECTED OUTPUT
22
+ - The json output must be in the format below:
23
+ {
24
+ "result": [
25
+ {"entity": "ENTITY_TYPE", "value": "EXTRACTED_VALUE"},
26
+ ...
27
+ ]
28
+ }
29
+ ```
30
+
31
+ # Supported Entities
32
+
33
+ * PERSON\_NAME
34
+ * BUSINESS\_NAME
35
+ * API\_KEY
36
+ * USERNAME
37
+ * API\_ENDPOINT
38
+ * WEBSITE\_ADDRESS
39
+ * PHONE\_NUMBER
40
+ * EMAIL\_ADDRESS
41
+ * ID
42
+ * PASSWORD
43
+ * ADDRESS
44
+
45
+ # Intended Use
46
+
47
+ The model is intended for PII detection in text documents to support tasks such as data anonymization, compliance, and security auditing.
48
+
49
+ # Limitations
50
+
51
+ * Not guaranteed to detect all forms of PII in every context.
52
+ * May return false positives or omit contextually relevant information.
53
+
54
+ ---
55
+
56
+ # Installation
57
+
58
+ Install the `vllm` package to run the model efficiently:
59
+
60
+ ```bash
61
+ pip install vllm
62
+ ```
63
+
64
+ ---
65
+
66
+ # Example:
67
+
68
+ ```python
69
+ from vllm import LLM, SamplingParams
70
+
71
+ llm = LLM("Fsoft-AIC/pii-phi")
72
+
73
+ system_prompt = """
74
+ # GUIDELINES
75
+ - Extract all instances of the following Personally Identifiable Information (PII) entities from the provided text and return them in JSON format.
76
+ - Each item in the JSON list should include an 'entity' key specifying the type of PII and a 'value' key containing the extracted information.
77
+ - The supported entities are: PERSON_NAME, BUSINESS_NAME, API_KEY, USERNAME, API_ENDPOINT, WEBSITE_ADDRESS, PHONE_NUMBER, EMAIL_ADDRESS, ID, PASSWORD, ADDRESS.
78
+
79
+ # EXPECTED OUTPUT
80
+ - The json output must be in the format below:
81
+ {
82
+ "result": [
83
+ {"entity": "ENTITY_TYPE", "value": "EXTRACTED_VALUE"},
84
+ ...
85
+ ]
86
+ }
87
+ """
88
+ pii_message = "I am James Jake and my employee number is 123123123"
89
+
90
+ sampling_params = SamplingParams(temperature=0, max_tokens=1000)
91
+ outputs = llm.chat(
92
+ [
93
+ {"role": "system", "content": system_prompt},
94
+ {"role": "user", "content": pii_message},
95
+ ],
96
+ sampling_params,
97
+ )
98
+
99
+
100
+ for output in outputs:
101
+ generated_text = output.outputs[0].text
102
+ print(generated_text)
103
+ ```