hugging-hat commited on
Commit
aea1b07
·
verified ·
1 Parent(s): a05a9b8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +59 -1
README.md CHANGED
@@ -1,3 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # NERPA — Fine-Tuned GLiNER2 for PII Anonymisation
2
 
3
  A fine-tuned [GLiNER2 Large](https://huggingface.co/fastino/gliner2-large-v1) (340M params) model trained to detect Personally Identifiable Information (PII) in text. Built as a flexible, self-hosted replacement for AWS Comprehend at [Overmind](https://overmindai.com).
@@ -143,8 +176,33 @@ The inference pipeline in `anonymise.py`:
143
  - **GLiNER2 version:** Requires `gliner2>=1.2.4`. Earlier versions had a bug where entity character offsets mapped to token positions instead of character positions; this is fixed in 1.2.4+.
144
  - **Device:** Automatically uses CUDA > MPS > CPU.
145
 
 
 
 
 
146
  ## Citation
147
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
  Built by [Akhat Rakishev](https://github.com/workhat) at [Overmind](https://overmindai.com).
149
 
150
- Base model: [GLiNER2](https://huggingface.co/fastino/gliner2-large-v1) by Fastino AI.
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: gliner2
6
+ tags:
7
+ - named-entity-recognition
8
+ - ner
9
+ - pii
10
+ - anonymisation
11
+ - gliner
12
+ - gliner2
13
+ - token-classification
14
+ - privacy
15
+ datasets:
16
+ - synthetic
17
+ base_model: fastino/gliner2-large-v1
18
+ model-index:
19
+ - name: NERPA
20
+ results:
21
+ - task:
22
+ type: token-classification
23
+ name: Named Entity Recognition
24
+ metrics:
25
+ - type: precision
26
+ value: 0.93
27
+ name: Micro-Precision
28
+ - type: recall
29
+ value: 0.90
30
+ name: Micro-Recall
31
+ pipeline_tag: token-classification
32
+ ---
33
+
34
  # NERPA — Fine-Tuned GLiNER2 for PII Anonymisation
35
 
36
  A fine-tuned [GLiNER2 Large](https://huggingface.co/fastino/gliner2-large-v1) (340M params) model trained to detect Personally Identifiable Information (PII) in text. Built as a flexible, self-hosted replacement for AWS Comprehend at [Overmind](https://overmindai.com).
 
176
  - **GLiNER2 version:** Requires `gliner2>=1.2.4`. Earlier versions had a bug where entity character offsets mapped to token positions instead of character positions; this is fixed in 1.2.4+.
177
  - **Device:** Automatically uses CUDA > MPS > CPU.
178
 
179
+ ## Acknowledgements
180
+
181
+ This model is a fine-tuned version of [GLiNER2 Large](https://huggingface.co/fastino/gliner2-large-v1) by [Fastino AI](https://fastino.ai). We thank the GLiNER2 authors for making their model and library openly available.
182
+
183
  ## Citation
184
 
185
+ If you use NERPA, please cite both this model and the original GLiNER2 paper:
186
+
187
+ ```bibtex
188
+ @misc{nerpa2025,
189
+ title={NERPA: Fine-Tuned GLiNER2 for PII Anonymisation},
190
+ author={Akhat Rakishev},
191
+ year={2025},
192
+ url={https://huggingface.co/OvermindLab/nerpa},
193
+ }
194
+
195
+ @misc{zaratiana2025gliner2efficientmultitaskinformation,
196
+ title={GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface},
197
+ author={Urchade Zaratiana and Gil Pasternak and Oliver Boyd and George Hurn-Maloney and Ash Lewis},
198
+ year={2025},
199
+ eprint={2507.18546},
200
+ archivePrefix={arXiv},
201
+ primaryClass={cs.CL},
202
+ url={https://arxiv.org/abs/2507.18546},
203
+ }
204
+ ```
205
+
206
  Built by [Akhat Rakishev](https://github.com/workhat) at [Overmind](https://overmindai.com).
207
 
208
+ Overmind is infrastructure to make agents more reliable. Learn more at [overmindai.com](https://overmindai.com).