OvermindLab
/

nerpa

@@ -1,3 +1,36 @@
 # NERPA — Fine-Tuned GLiNER2 for PII Anonymisation
 A fine-tuned [GLiNER2 Large](https://huggingface.co/fastino/gliner2-large-v1) (340M params) model trained to detect Personally Identifiable Information (PII) in text. Built as a flexible, self-hosted replacement for AWS Comprehend at [Overmind](https://overmindai.com).
@@ -143,8 +176,33 @@ The inference pipeline in `anonymise.py`:
 - **GLiNER2 version:** Requires `gliner2>=1.2.4`. Earlier versions had a bug where entity character offsets mapped to token positions instead of character positions; this is fixed in 1.2.4+.
 - **Device:** Automatically uses CUDA > MPS > CPU.
 ## Citation
 Built by [Akhat Rakishev](https://github.com/workhat) at [Overmind](https://overmindai.com).
-Base model: [GLiNER2](https://huggingface.co/fastino/gliner2-large-v1) by Fastino AI.

+---
+language:
+  - en
+license: apache-2.0
+library_name: gliner2
+tags:
+  - named-entity-recognition
+  - ner
+  - pii
+  - anonymisation
+  - gliner
+  - gliner2
+  - token-classification
+  - privacy
+datasets:
+  - synthetic
+base_model: fastino/gliner2-large-v1
+model-index:
+  - name: NERPA
+    results:
+      - task:
+          type: token-classification
+          name: Named Entity Recognition
+        metrics:
+          - type: precision
+            value: 0.93
+            name: Micro-Precision
+          - type: recall
+            value: 0.90
+            name: Micro-Recall
+pipeline_tag: token-classification
+---
 # NERPA — Fine-Tuned GLiNER2 for PII Anonymisation
 A fine-tuned [GLiNER2 Large](https://huggingface.co/fastino/gliner2-large-v1) (340M params) model trained to detect Personally Identifiable Information (PII) in text. Built as a flexible, self-hosted replacement for AWS Comprehend at [Overmind](https://overmindai.com).
 - **GLiNER2 version:** Requires `gliner2>=1.2.4`. Earlier versions had a bug where entity character offsets mapped to token positions instead of character positions; this is fixed in 1.2.4+.
 - **Device:** Automatically uses CUDA > MPS > CPU.
+## Acknowledgements
+This model is a fine-tuned version of [GLiNER2 Large](https://huggingface.co/fastino/gliner2-large-v1) by [Fastino AI](https://fastino.ai). We thank the GLiNER2 authors for making their model and library openly available.
 ## Citation
+If you use NERPA, please cite both this model and the original GLiNER2 paper:
+```bibtex
+@misc{nerpa2025,
+  title={NERPA: Fine-Tuned GLiNER2 for PII Anonymisation},
+  author={Akhat Rakishev},
+  year={2025},
+  url={https://huggingface.co/OvermindLab/nerpa},
+}
+@misc{zaratiana2025gliner2efficientmultitaskinformation,
+  title={GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface},
+  author={Urchade Zaratiana and Gil Pasternak and Oliver Boyd and George Hurn-Maloney and Ash Lewis},
+  year={2025},
+  eprint={2507.18546},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2507.18546},
+}
+```
 Built by [Akhat Rakishev](https://github.com/workhat) at [Overmind](https://overmindai.com).
+Overmind is infrastructure to make agents more reliable. Learn more at [overmindai.com](https://overmindai.com).