BRIGHT NER: EDS-NLP (CamemBERT + CRF) fine-tuned for diagnosis

Description

This is a EDS-NLP (CamemBERT + CRF) architecture fine-tuned to extract clinical neuro-oncology entities related to the diagnosis semantic group. It was trained on a synthetic dataset generated for the properly de-identified BRIGHT project dataset (see the generated_data folder in the primary repository).

This model repository was specifically designed to fit within the bright_db overarching namespace.

Fields

It extracts the following fields (described in French):

diag_histologique: Diagnostic anatomopathologique
diag_integre: Diagnostic intégré OMS 2021
classification_oms: Classification OMS utilisée (2007, 2016 ou 2021)
grade: Grade OMS (1, 2, 3 ou 4)
num_labo: Numéro échantillon laboratoire anatomopathologie

Performance on Validation Set

Aggregates:

Macro F1: 0.8299 (Precision: 0.8327, Recall: 0.8400)
Micro F1: 0.8554 (Precision: 0.8532, Recall: 0.8576)

Per-Label Breakdowns:

Label	Precision	Recall	F1
diag_histologique	0.9869	0.9467	0.9664
diag_integre	0.7157	0.9145	0.8030
classification_oms	0.8700	1.0000	0.9305
grade	0.9615	0.8814	0.9197
num_labo	0.6296	0.4574	0.5299

Usage

# Inference Code
import edsnlp

nlp = edsnlp.load("raphael-r/bright-eds-diagnosis")
doc = nlp("Patient presenting with epileptic seizures...")

for ent in doc.ents:
    print(ent.text, "=>", ent.label_)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including raphael-r/bright-eds-diagnosis

bright_db

Collection

Collection of models for the BRIGHT clinical database project. • 10 items • Updated Apr 8