PII detection and Redaction using an NER model
Here we provide code to:
- fine-tune an encoder model (like StarEncoder) for the task of PII detection (NER): see folder
pii_train_ner - run inference with our fine-tuned StarPII for PII detection on multiple GPUs: see folder
pii_inference - redact/mask PII detected with the model: see folder
pii_redaction
This is the code we used for PII anonymization in the 800GB dataset StarCoderData.