bright_db
Collection
Collection of models for the BRIGHT clinical database project. • 20 items • Updated
This is a EDS-NLP (CamemBERT + CRF) architecture fine-tuned to extract clinical neuro-oncology entities related to the chromosomal semantic group. It was trained on a synthetic dataset generated for the properly de-identified BRIGHT project dataset (see the generated_data folder in the primary repository).
This model repository was specifically designed to fit within the bright_db overarching namespace.
It extracts the following fields (described in French):
Aggregates:
Per-Label Breakdowns:
| Label | Precision | Recall | F1 |
|---|---|---|---|
| ch1p | 0.8261 | 0.7037 | 0.7600 |
| ch19q | 1.0000 | 0.3077 | 0.4706 |
| ch1p19q_codel | 0.8364 | 0.6216 | 0.7132 |
| ch7p | 0.8804 | 0.9529 | 0.9153 |
| ch7q | 0.8889 | 0.6667 | 0.7619 |
| ch10p | 0.8571 | 0.4000 | 0.5455 |
| ch10q | 0.8636 | 0.9870 | 0.9212 |
| ch9p | 1.0000 | 0.7333 | 0.8462 |
| ch9q | 0.0000 | 0.0000 | 0.0000 |
| ampli_egfr | 0.6667 | 0.4000 | 0.5000 |
| ampli_cdk4 | 0.0000 | 0.0000 | 0.0000 |
| ampli_mdm2 | 0.0000 | 0.0000 | 0.0000 |
| ampli_mdm4 | 0.0000 | 0.0000 | 0.0000 |
| ampli_met | 0.0000 | 0.0000 | 0.0000 |
| fusion_fgfr | 0.0000 | 0.0000 | 0.0000 |
| fusion_ntrk | 0.0000 | 0.0000 | 0.0000 |
| fusion_autre | 1.0000 | 1.0000 | 1.0000 |
# Inference Code
import edsnlp
nlp = edsnlp.load("raphael-r/bright-eds-chromosomal")
doc = nlp("Patient presenting with epileptic seizures...")
for ent in doc.ents:
print(ent.text, "=>", ent.label_)