pii-classifier-tab-dataset / README.md

gpadres

GPJ: updating readme

31bc30f about 1 year ago

preview code

raw

history blame contribute delete

868 Bytes

metadata

tags:
  - model_hub_mixin
  - pytorch_model_hub_mixin
datasets: mattmdjaga/text-anonymization-benchmark-train
license: apache-2.0
base_model: allenai/longformer-base-4096
base_model_relation: finetune
model_id: pii-classifier-tab-dataset

Model Card for pii-classifier-tab-dataset

Model is a Longformer with a classification head, finetuned on Text Anonymization Benchmark (TAB) dataset for indicating if a token is part of a Personal Identifiable Information (PII) and should be masked out or not. Model output is the logits of the input sequence, where the classes are 1 (MASK) or 0 (NO-MASK), e.g. no IOB format used.

Model is used as an example in LeakPro repo. For further detail, see example notebook.