bright_db
Collection
Collection of models for the BRIGHT clinical database project. • 20 items • Updated
This is a EDS-NLP (CamemBERT + CRF) architecture fine-tuned to extract clinical neuro-oncology entities related to the molecular semantic group. It was trained on a synthetic dataset generated for the properly de-identified BRIGHT project dataset (see the generated_data folder in the primary repository).
This model repository was specifically designed to fit within the bright_db overarching namespace.
It extracts the following fields (described in French):
Aggregates:
Per-Label Breakdowns:
| Label | Precision | Recall | F1 |
|---|---|---|---|
| mol_idh1 | 0.9481 | 0.9309 | 0.9394 |
| mol_idh2 | 0.7929 | 0.7929 | 0.7929 |
| mol_mgmt | 0.8317 | 0.8984 | 0.8638 |
| mol_h3f3a | 0.8571 | 0.9231 | 0.8889 |
| mol_hist1h3b | 0.0000 | 0.0000 | 0.0000 |
| mol_tert | 0.8283 | 0.8542 | 0.8410 |
| mol_CDKN2A | 0.8148 | 0.7333 | 0.7719 |
| mol_atrx | 0.9714 | 0.7556 | 0.8500 |
| mol_cic | 0.9130 | 0.7000 | 0.7925 |
| mol_fubp1 | 0.8276 | 0.9600 | 0.8889 |
| mol_fgfr1 | 0.0000 | 0.0000 | 0.0000 |
| mol_egfr_mut | 0.0000 | 0.0000 | 0.0000 |
| mol_prkca | 0.0000 | 0.0000 | 0.0000 |
| mol_pten | 0.6000 | 0.4286 | 0.5000 |
| mol_p53 | 0.0000 | 0.0000 | 0.0000 |
| mol_braf | 1.0000 | 0.8000 | 0.8889 |
# Inference Code
import edsnlp
nlp = edsnlp.load("raphael-r/bright-eds-molecular")
doc = nlp("Patient presenting with epileptic seizures...")
for ent in doc.ents:
print(ent.text, "=>", ent.label_)