File size: 2,669 Bytes
47b3421
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
678ae4b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
### Overview

`pii-phi` is a fine-tuned version of `Phi-3.5-mini-instruct` designed to extract Personally Identifiable Information (PII) from unstructured text. The model outputs PII entities in a structured JSON format according to strict schema guidelines.

### Training Prompt Format

```text
# GUIDELINES
- Extract all instances of the following Personally Identifiable Information (PII) entities from the provided text and return them in JSON format.
- Each item in the JSON list should include an 'entity' key specifying the type of PII and a 'value' key containing the extracted information.
- The supported entities are: PERSON_NAME, BUSINESS_NAME, API_KEY, USERNAME, API_ENDPOINT, WEBSITE_ADDRESS, PHONE_NUMBER, EMAIL_ADDRESS, ID, PASSWORD, ADDRESS.

# EXPECTED OUTPUT
- The json output must be in the format below:
{
    "result": [
        {"entity": "ENTITY_TYPE", "value": "EXTRACTED_VALUE"},
        ...
    ]
}
```

### Supported Entities

* PERSON\_NAME
* BUSINESS\_NAME
* API\_KEY
* USERNAME
* API\_ENDPOINT
* WEBSITE\_ADDRESS
* PHONE\_NUMBER
* EMAIL\_ADDRESS
* ID
* PASSWORD
* ADDRESS

### Intended Use

The model is intended for PII detection in text documents to support tasks such as data anonymization, compliance, and security auditing.

### Limitations

* Not guaranteed to detect all forms of PII in every context.
* May return false positives or omit contextually relevant information.

---

### Installation

Install the `vllm` package to run the model efficiently:

```bash
pip install vllm
```

---

### Example:

```python
from vllm import LLM, SamplingParams

llm = LLM("Fsoft-AIC/pii-phi")

system_prompt = """
# GUIDELINES
- Extract all instances of the following Personally Identifiable Information (PII) entities from the provided text and return them in JSON format.
- Each item in the JSON list should include an 'entity' key specifying the type of PII and a 'value' key containing the extracted information.
- The supported entities are: PERSON_NAME, BUSINESS_NAME, API_KEY, USERNAME, API_ENDPOINT, WEBSITE_ADDRESS, PHONE_NUMBER, EMAIL_ADDRESS, ID, PASSWORD, ADDRESS.

# EXPECTED OUTPUT
- The json output must be in the format below:
{
    "result": [
        {"entity": "ENTITY_TYPE", "value": "EXTRACTED_VALUE"},
        ...
    ]
}
"""
pii_message = "I am James Jake and my employee number is 123123123"

sampling_params = SamplingParams(temperature=0, max_tokens=1000)
outputs = llm.chat(
    [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": pii_message},
    ],
    sampling_params,
)


for output in outputs:
    generated_text = output.outputs[0].text
    print(generated_text)
```