BRIGHT NER: EDS-NLP (CamemBERT + CRF) fine-tuned for molecular

Description

This is a EDS-NLP (CamemBERT + CRF) architecture fine-tuned to extract clinical neuro-oncology entities related to the molecular semantic group. It was trained on a synthetic dataset generated for the properly de-identified BRIGHT project dataset (see the generated_data folder in the primary repository).

This model repository was specifically designed to fit within the bright_db overarching namespace.

Fields

It extracts the following fields (described in French):

  • mol_idh1: Statut mutation IDH1
  • mol_idh2: Statut mutation IDH2
  • mol_mgmt: Méthylation promoteur MGMT
  • mol_h3f3a: Mutation H3F3A
  • mol_hist1h3b: Mutation HIST1H3B
  • mol_tert: Mutation promoteur TERT
  • mol_CDKN2A: Délétion homozygote CDKN2A
  • mol_atrx: Mutation ATRX
  • mol_cic: Mutation CIC
  • mol_fubp1: Mutation FUBP1
  • mol_fgfr1: Mutation FGFR1
  • mol_egfr_mut: Mutation EGFR
  • mol_prkca: Mutation PRKCA
  • mol_pten: Mutation PTEN
  • mol_p53: Mutation p53
  • mol_braf: Mutation BRAF

Performance on Validation Set

Aggregates:

  • Macro F1: 0.5636 (Precision: 0.5866, Recall: 0.5486)
  • Micro F1: 0.8612 (Precision: 0.8680, Recall: 0.8544)

Per-Label Breakdowns:

Label Precision Recall F1
mol_idh1 0.9481 0.9309 0.9394
mol_idh2 0.7929 0.7929 0.7929
mol_mgmt 0.8317 0.8984 0.8638
mol_h3f3a 0.8571 0.9231 0.8889
mol_hist1h3b 0.0000 0.0000 0.0000
mol_tert 0.8283 0.8542 0.8410
mol_CDKN2A 0.8148 0.7333 0.7719
mol_atrx 0.9714 0.7556 0.8500
mol_cic 0.9130 0.7000 0.7925
mol_fubp1 0.8276 0.9600 0.8889
mol_fgfr1 0.0000 0.0000 0.0000
mol_egfr_mut 0.0000 0.0000 0.0000
mol_prkca 0.0000 0.0000 0.0000
mol_pten 0.6000 0.4286 0.5000
mol_p53 0.0000 0.0000 0.0000
mol_braf 1.0000 0.8000 0.8889

Usage

# Inference Code
import edsnlp

nlp = edsnlp.load("raphael-r/bright-eds-molecular")
doc = nlp("Patient presenting with epileptic seizures...")

for ent in doc.ents:
    print(ent.text, "=>", ent.label_)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including raphael-r/bright-eds-molecular