ab-ai
/

PII-Model-Phi3-Mini

Text Generation

token classification

text-generation-inference

Model card Files Files and versions

ab-ai commited on Aug 13, 2024

Commit

2018391

·

verified ·

1 Parent(s): ab632e6

Update README.md

Files changed (1) hide show

README.md +27 -1

README.md CHANGED Viewed

@@ -108,4 +108,30 @@ The model is capable of detecting the following PII entities:
 To use this model, you'll need to have the `transformers` library installed:
 ```bash
-pip install transformers

 To use this model, you'll need to have the `transformers` library installed:
 ```bash
+pip install transformers
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("ab-ai/PII-Model-Phi3-Mini")
+model = AutoModelForTokenClassification.from_pretrained("ab-ai/PII-Model-Phi3-Mini")
+input_text = "Hi Abner, just a reminder that your next primary care appointment is on 23/03/1926. Please confirm by replying to this email Nathen15@hotmail.com."
+model_prompt = f"""### Instruction:
+    Identify and extract the following PII entities from the text, if present: companyname, pin, currencyname, email, phoneimei, litecoinaddress, currency, eyecolor, street, mac, state, time, vehiclevin, jobarea, date, bic, currencysymbol, currencycode, age, nearbygpscoordinate, amount, ssn, ethereumaddress, zipcode, buildingnumber, dob, firstname, middlename, ordinaldirection, jobtitle, bitcoinaddress, jobtype, phonenumber, height, password, ip, useragent, accountname, city, gender, secondaryaddress, iban, sex, prefix, ipv4, maskednumber, url, username, lastname, creditcardcvv, county, vehiclevrm, ipv6, creditcardissuer, accountnumber, creditcardnumber. Return the output in JSON format.
+    ### Input:
+    {input_text}
+    ### Output: """
+inputs = tokenizer(model_prompt, return_tensors="pt").to(device)
+# adjust max_new_tokens according to your need
+outputs = model.generate(**inputs, do_sample=True, max_new_tokens=120)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)