BRIGHT NER: EDS-NLP (CamemBERT + CRF) fine-tuned for diagnosis

Description

This is a EDS-NLP (CamemBERT + CRF) architecture fine-tuned to extract clinical neuro-oncology entities related to the diagnosis semantic group. It was trained on a synthetic dataset generated for the properly de-identified BRIGHT project dataset (see the generated_data folder in the primary repository).

This model repository was specifically designed to fit within the bright_db overarching namespace.

Fields

It extracts the following fields (described in French):

  • diag_histologique: Diagnostic anatomopathologique
  • diag_integre: Diagnostic intégré OMS 2021
  • classification_oms: Classification OMS utilisée (2007, 2016 ou 2021)
  • grade: Grade OMS (1, 2, 3 ou 4)
  • num_labo: Numéro échantillon laboratoire anatomopathologie

Performance on Validation Set

Aggregates:

  • Macro F1: 0.8299 (Precision: 0.8327, Recall: 0.8400)
  • Micro F1: 0.8554 (Precision: 0.8532, Recall: 0.8576)

Per-Label Breakdowns:

Label Precision Recall F1
diag_histologique 0.9869 0.9467 0.9664
diag_integre 0.7157 0.9145 0.8030
classification_oms 0.8700 1.0000 0.9305
grade 0.9615 0.8814 0.9197
num_labo 0.6296 0.4574 0.5299

Usage

# Inference Code
import edsnlp

nlp = edsnlp.load("raphael-r/bright-eds-diagnosis")
doc = nlp("Patient presenting with epileptic seizures...")

for ent in doc.ents:
    print(ent.text, "=>", ent.label_)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including raphael-r/bright-eds-diagnosis