CTA / backend /seeder_v2.log
TheQuantEd's picture
Fix: empty Neo4j startup + auto-seed on first boot
6fa287a
============================================================
ClinicalMatch AI Graph Seeder v2
100 k synthetic patients · 20 oncology conditions
============================================================
[1/5] Seeding clinical trials from ClinicalTrials.gov...
breast cancer: 50 trials fetched
prostate cancer: 50 trials fetched
non-small cell lung cancer: 50 trials fetched
colorectal cancer: 50 trials fetched
ovarian cancer: 50 trials fetched
melanoma: 50 trials fetched
leukemia: 50 trials fetched
lymphoma: 50 trials fetched
glioblastoma: 50 trials fetched
pancreatic cancer: 50 trials fetched
bladder cancer: 50 trials fetched
renal cell carcinoma: 50 trials fetched
thyroid cancer: 50 trials fetched
multiple myeloma: 50 trials fetched
endometrial cancer: 50 trials fetched
cervical cancer: 50 trials fetched
gastric cancer: 50 trials fetched
hepatocellular carcinoma: 50 trials fetched
head and neck cancer: 50 trials fetched
sarcoma: 50 trials fetched
Total trials seeded: 1000
[2/5] Seeding medications from RxNorm...
trastuzumab: 5 RxCUI concepts
pembrolizumab: 5 RxCUI concepts
nivolumab: 5 RxCUI concepts
osimertinib: 4 RxCUI concepts
olaparib: 5 RxCUI concepts
enzalutamide: 5 RxCUI concepts
bevacizumab: 5 RxCUI concepts
rituximab: 5 RxCUI concepts
imatinib: 5 RxCUI concepts
dabrafenib: 5 RxCUI concepts
vemurafenib: 2 RxCUI concepts
atezolizumab: 5 RxCUI concepts
durvalumab: 4 RxCUI concepts
cetuximab: 4 RxCUI concepts
erlotinib: 5 RxCUI concepts
capecitabine: 4 RxCUI concepts
Total medications seeded: 16
[3/5] Seeding diagnoses from ICD-10 CM...
ICD-10 C50: 20 codes
ICD-10 C61: 1 codes
ICD-10 C34: 16 codes
ICD-10 C18: 10 codes
ICD-10 C56: 4 codes
ICD-10 C43: 20 codes
ICD-10 C91: 20 codes
ICD-10 C85: 20 codes
ICD-10 C71: 10 codes
ICD-10 C25: 8 codes
Total diagnoses seeded: 129
[4/5] Seeding supporting literature from PubMed...
breast cancer: 5 publications linked
prostate cancer: 5 publications linked
non-small cell lung cancer: 5 publications linked
colorectal cancer: 5 publications linked
ovarian cancer: 5 publications linked
Total publications seeded: 25
[5/5] Seeding biomarkers (curated from COSMIC/NCIT)...
57 biomarkers seeded and linked to conditions
[+] Deriving eligibility relationships...
Eligibility relationships derived.
[6/6] Generating 100,000 clinically-informed synthetic patients...
(SEER incidence weights · TCGA biomarker prevalence · US Census demographics)
breast cancer: 17,222 patients already done, skipping
non-small cell lung cancer: 14,444 patients (40 trials) [resuming from 4,000]
wrote 10,444 patients | total so far: 31,666/100,000 | edges: 351,181
prostate cancer: 10,556 patients (44 trials)
wrote 10,556 patients | total so far: 42,222/100,000 | edges: 766,259
colorectal cancer: 9,444 patients (34 trials)
wrote 9,444 patients | total so far: 51,666/100,000 | edges: 1,047,777
melanoma: 6,111 patients (56 trials)
wrote 6,111 patients | total so far: 57,777/100,000 | edges: 1,329,267
bladder cancer: 5,000 patients (41 trials)
wrote 5,000 patients | total so far: 62,777/100,000 | edges: 1,509,909
renal cell carcinoma: 4,667 patients (42 trials)
wrote 4,667 patients | total so far: 67,444/100,000 | edges: 1,687,802
lymphoma: 4,667 patients (46 trials)
wrote 4,667 patients | total so far: 72,111/100,000 | edges: 1,847,153
endometrial cancer: 4,222 patients (40 trials)
wrote 4,222 patients | total so far: 76,333/100,000 | edges: 1,992,865
leukemia: 3,889 patients (27 trials)
wrote 3,889 patients | total so far: 80,222/100,000 | edges: 2,071,433
pancreatic cancer: 3,667 patients (35 trials)
wrote 3,667 patients | total so far: 83,889/100,000 | edges: 2,172,901
thyroid cancer: 3,333 patients (41 trials)
wrote 3,333 patients | total so far: 87,222/100,000 | edges: 2,302,009
multiple myeloma: 2,778 patients (50 trials)
wrote 2,778 patients | total so far: 90,000/100,000 | edges: 2,415,994
gastric cancer: 2,000 patients (38 trials)
wrote 2,000 patients | total so far: 92,000/100,000 | edges: 2,474,564
ovarian cancer: 2,000 patients (29 trials)
wrote 2,000 patients | total so far: 94,000/100,000 | edges: 2,516,658
hepatocellular carcinoma: 1,667 patients (47 trials)
wrote 1,667 patients | total so far: 95,667/100,000 | edges: 2,578,834
glioblastoma: 1,333 patients (45 trials)
wrote 1,333 patients | total so far: 97,000/100,000 | edges: 2,623,714
head and neck cancer: 1,333 patients (49 trials)
wrote 1,333 patients | total so far: 98,333/100,000 | edges: 2,677,369
cervical cancer: 889 patients (10 trials)
wrote 889 patients | total so far: 99,222/100,000 | edges: 2,685,508
sarcoma: 778 patients (50 trials)
wrote 778 patients | total so far: 100,000/100,000 | edges: 2,717,650
Total patients: 100,000
Total ELIGIBLE_FOR edges: 2,717,650
============================================================
Seeding complete in 21.1 min
Trials: 1000
Medications: 16
Diagnoses: 129
Publications: 25
Biomarkers: 57
Patients: 100,000
============================================================