Upload Spanish PII detection model OpenMed-PII-Spanish-SuperMedical-Base-125M-v1

Files changed (1) hide show

README.md CHANGED Viewed

@@ -219,6 +219,21 @@ for entity in entities:
     print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")
 ```
 ### De-identification Example
 ```python

     print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")
 ```
+> **Important — Accent Handling:** This model was trained on text without diacritical marks (accents). For best results, strip accents from your input before inference. Character offsets are preserved, so you can map entities back to the original text.
+>
+> ```python
+> import unicodedata
+>
+> def strip_accents(text: str) -> str:
+>     nfc = unicodedata.normalize("NFC", text)
+>     nfd = unicodedata.normalize("NFD", nfc)
+>     stripped = "".join(ch for ch in nfd if unicodedata.category(ch) != "Mn")
+>     return unicodedata.normalize("NFC", stripped)
+>
+> text = strip_accents(text)  # call before passing to the pipeline
+> entities = ner(text)
+> ```
 ### De-identification Example
 ```python