metadata
tags:
- eds
- ner
- medical
- french
language:
- fr
BRIGHT NER: EDS-NLP (CamemBERT + CRF) fine-tuned for molecular
Description
This is a EDS-NLP (CamemBERT + CRF) architecture fine-tuned to extract clinical neuro-oncology entities related to the molecular semantic group. It was trained on a synthetic dataset generated for the properly de-identified BRIGHT project dataset (see the generated_data folder in the primary repository).
This model repository was specifically designed to fit within the bright_db overarching namespace.
Fields
It extracts the following fields (described in French):
- mol_idh1: Statut mutation IDH1
- mol_idh2: Statut mutation IDH2
- mol_mgmt: Méthylation promoteur MGMT
- mol_h3f3a: Mutation H3F3A
- mol_hist1h3b: Mutation HIST1H3B
- mol_tert: Mutation promoteur TERT
- mol_CDKN2A: Délétion homozygote CDKN2A
- mol_atrx: Mutation ATRX
- mol_cic: Mutation CIC
- mol_fubp1: Mutation FUBP1
- mol_fgfr1: Mutation FGFR1
- mol_egfr_mut: Mutation EGFR
- mol_prkca: Mutation PRKCA
- mol_pten: Mutation PTEN
- mol_p53: Mutation p53
- mol_braf: Mutation BRAF
Performance on Validation Set
Aggregates:
- Macro F1: 0.5636 (Precision: 0.5866, Recall: 0.5486)
- Micro F1: 0.8612 (Precision: 0.8680, Recall: 0.8544)
Per-Label Breakdowns:
| Label | Precision | Recall | F1 |
|---|---|---|---|
| mol_idh1 | 0.9481 | 0.9309 | 0.9394 |
| mol_idh2 | 0.7929 | 0.7929 | 0.7929 |
| mol_mgmt | 0.8317 | 0.8984 | 0.8638 |
| mol_h3f3a | 0.8571 | 0.9231 | 0.8889 |
| mol_hist1h3b | 0.0000 | 0.0000 | 0.0000 |
| mol_tert | 0.8283 | 0.8542 | 0.8410 |
| mol_CDKN2A | 0.8148 | 0.7333 | 0.7719 |
| mol_atrx | 0.9714 | 0.7556 | 0.8500 |
| mol_cic | 0.9130 | 0.7000 | 0.7925 |
| mol_fubp1 | 0.8276 | 0.9600 | 0.8889 |
| mol_fgfr1 | 0.0000 | 0.0000 | 0.0000 |
| mol_egfr_mut | 0.0000 | 0.0000 | 0.0000 |
| mol_prkca | 0.0000 | 0.0000 | 0.0000 |
| mol_pten | 0.6000 | 0.4286 | 0.5000 |
| mol_p53 | 0.0000 | 0.0000 | 0.0000 |
| mol_braf | 1.0000 | 0.8000 | 0.8889 |
Usage
# Inference Code
import edsnlp
nlp = edsnlp.load("raphael-r/bright-eds-molecular")
doc = nlp("Patient presenting with epileptic seizures...")
for ent in doc.ents:
print(ent.text, "=>", ent.label_)