| # TrialPath ๆฐๆฎไธ่ฏไผฐ็ฎก็บฟ TDD ๅฎ็ฐๆๅ | |
| > ๅบไบ DeepWikiใTREC ๅฎๆนๆๆกฃใir-measures/ir_datasets ๅบๆทฑๅบฆ็ ็ฉถไบงๅบ | |
| --- | |
| ## 1. ็ฎก็บฟๆถๆๆฆ่ง | |
| ### 1.1 ๆฐๆฎๆตๅพ | |
| ``` | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Data & Evaluation Pipeline โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค | |
| โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ | |
| โ โ Synthea โโโโโถโ FHIR Bundle โโโโโถโ PatientProfile โ โ | |
| โ โ (Java CLI) โ โ (JSON) โ โ (JSON Schema) โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโ โ | |
| โ โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โผ โ | |
| โ โ LLM Letter โโโโโถโ ReportLab โโโโโถ Noisy Clinical PDFs โ | |
| โ โ Generator โ โ + Augraphy โ (Letters/Labs/Path) โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ | |
| โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ | |
| โ โ MedGemma โโโโโถโ Extracted โโโโโถโ F1 Evaluator โ โ | |
| โ โ Extractor โ โ Profile โ โ (scikit-learn) โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ | |
| โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ | |
| โ โ TREC Topics โโโโโถโ TrialPath โโโโโถโ TREC Evaluator โ โ | |
| โ โ (ir_datasets)โ โ Matching โ โ (ir-measures) โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ | |
| โ โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ``` | |
| ### 1.2 ๆจกๅๅ ณ็ณป | |
| | ๆจกๅ | ่พๅ ฅ | ่พๅบ | ไพ่ต | | |
| |------|------|------|------| | |
| | `data/generate_synthetic_patients.py` | Synthea FHIR Bundles | `PatientProfile` JSON + Ground Truth | Synthea CLI, FHIR R4 | | |
| | `data/generate_noisy_pdfs.py` | `PatientProfile` JSON | Clinical PDFs (ๅธฆๅชๅฃฐ) | ReportLab, Augraphy | | |
| | `evaluation/run_trec_benchmark.py` | TREC Topics + TrialPath Run | Recall@50, NDCG@10, P@10 | ir_datasets, ir-measures | | |
| | `evaluation/extraction_eval.py` | Extracted vs Ground Truth Profiles | Field-level F1 | scikit-learn | | |
| | `evaluation/criterion_eval.py` | EligibilityLedger vs Gold Standard | Criterion Accuracy | scikit-learn | | |
| | `evaluation/latency_cost_tracker.py` | API call logs | Latency/Cost reports | time, logging | | |
| ### 1.3 ็ฎๅฝ็ปๆ | |
| ``` | |
| data/ | |
| โโโ generate_synthetic_patients.py # Synthea FHIR โ PatientProfile | |
| โโโ generate_noisy_pdfs.py # PatientProfile โ Clinical PDFs | |
| โโโ synthea_config/ | |
| โ โโโ synthea.properties # Synthea ้ ็ฝฎ | |
| โ โโโ modules/ | |
| โ โโโ lung_cancer_extended.json # ๆฉๅฑ NSCLC ๆจกๅ (ๅซ biomarkers) | |
| โโโ templates/ | |
| โ โโโ clinical_letter.py # ไธดๅบไฟกไปถๆจกๆฟ | |
| โ โโโ pathology_report.py # ็ ็ๆฅๅๆจกๆฟ | |
| โ โโโ lab_report.py # ๅฎ้ชๅฎคๆฅๅๆจกๆฟ | |
| โ โโโ imaging_report.py # ๅฝฑๅๆฅๅๆจกๆฟ | |
| โโโ noise/ | |
| โ โโโ noise_injector.py # ๅชๅฃฐๆณจๅ ฅๅผๆ | |
| โโโ output/ | |
| โโโ fhir/ # Synthea ๅๅง FHIR ่พๅบ | |
| โโโ profiles/ # ่ฝฌๆขๅ็ PatientProfile JSON | |
| โโโ pdfs/ # ็ๆ็ไธดๅบ PDF | |
| โโโ ground_truth/ # ๆ ๆณจๆฐๆฎ | |
| evaluation/ | |
| โโโ run_trec_benchmark.py # TREC ๆฃ็ดข่ฏไผฐ | |
| โโโ extraction_eval.py # MedGemma ๆๅ F1 | |
| โโโ criterion_eval.py # Criterion Decision Accuracy | |
| โโโ latency_cost_tracker.py # ๅปถ่ฟไธๆๆฌ่ฟฝ่ธช | |
| โโโ trec_data/ | |
| โ โโโ topics2021.xml # TREC 2021 topics | |
| โ โโโ qrels2021.txt # TREC 2021 relevance judgments | |
| โ โโโ topics2022.xml # TREC 2022 topics | |
| โโโ reports/ # ่ฏไผฐๆฅๅ่พๅบ | |
| tests/ | |
| โโโ test_synthea_data.py # Synthea ๆฐๆฎ้ช่ฏ | |
| โโโ test_pdf_generation.py # PDF ็ๆๆญฃ็กฎๆง | |
| โโโ test_noise_injection.py # ๅชๅฃฐๆณจๅ ฅๆๆ | |
| โโโ test_trec_evaluation.py # TREC ่ฏไผฐ่ฎก็ฎ | |
| โโโ test_extraction_f1.py # F1 ่ฎก็ฎๆต่ฏ | |
| โโโ test_latency_cost.py # ๅปถ่ฟๆๆฌๆต่ฏ | |
| โโโ test_e2e_pipeline.py # ็ซฏๅฐ็ซฏ็ฎก็บฟๆต่ฏ | |
| ``` | |
| --- | |
| ## 2. Synthea ๅๆๆฃ่ ็ๆๆๅ | |
| ### 2.1 Synthea ๆฆ่ฟฐ | |
| Synthea ๆฏ MITRE ๅผๅ็ๅผๆบๅๆๆฃ่ ๆจกๆๅจ๏ผๅบไบ Java ๅฎ็ฐใๅฎ้่ฟ JSON ็ถๆๆบๆจกๅๆจกๆ็พ็ ่ฝจ่ฟน๏ผ่พๅบๆ ๅ FHIR R4 Bundleใ | |
| **ๅ ณ้ฎ็นๆง๏ผๆฅๆบ๏ผDeepWiki synthetichealth/synthea๏ผ๏ผ** | |
| - ๅบไบๆจกๅ็็พ็ ๆจกๆ๏ผๆฏ็ง็พ็ ๅฎไนไธบ JSON ็ถๆๆบ | |
| - ๆฏๆ FHIR R4/STU3/DSTU2 ๅฏผๅบ | |
| - ๅ ็ฝฎ `lung_cancer.json` ๆจกๅ๏ผ85% NSCLC / 15% SCLC ๅๅธ | |
| - ๆฏๆ Stage I-IV ๅๆๅๅ็/ๆพ็ๆฒป็่ทฏๅพ | |
| - **ไธๅซ NSCLC ็นๅผๆง biomarkers๏ผEGFR, ALK, PD-L1, KRAS, ROS1๏ผโโ ้่ฆ่ชๅฎไนๆฉๅฑ** | |
| ### 2.2 ๅฎ่ฃ ๅ้ ็ฝฎ | |
| **็ณป็ป่ฆๆฑ๏ผ** | |
| - Java JDK 11 ๆๆด้ซ็ๆฌ๏ผๆจ่ LTS 11 ๆ 17๏ผ | |
| **ๅฎ่ฃ ๆนๅผ A๏ผ็ดๆฅไฝฟ็จ JAR๏ผๆจ่็จไบๆฐๆฎ็ๆ๏ผ** | |
| ```bash | |
| # ไธ่ฝฝๆๆฐ release JAR | |
| # ไป https://github.com/synthetichealth/synthea/releases ่ทๅ | |
| wget https://github.com/synthetichealth/synthea/releases/download/master-branch-latest/synthea-with-dependencies.jar | |
| # ้ช่ฏๅฎ่ฃ | |
| java -jar synthea-with-dependencies.jar --help | |
| ``` | |
| **ๅฎ่ฃ ๆนๅผ B๏ผไปๆบ็ ๆๅปบ๏ผ้่ฆ่ชๅฎไนๆจกๅๆถไฝฟ็จ๏ผ** | |
| ```bash | |
| git clone https://github.com/synthetichealth/synthea.git | |
| cd synthea | |
| ./gradlew build check test | |
| ``` | |
| ### 2.3 NSCLC ๆจกๅ้ ็ฝฎ | |
| #### 2.3.1 ็ฐๆ lung_cancer ๆจกๅๅๆ | |
| ๆฅๆบ๏ผDeepWiki ๅฏน `synthetichealth/synthea` ็ `lung_cancer.json` ๆจกๅๅๆ๏ผ | |
| - **ๅ ฅๅฃๆกไปถ**๏ผ45-65 ๅฒไบบ็พค๏ผๅบไบๆฆ็่ฎก็ฎ | |
| - **่ฏๆญๆต็จ**๏ผ็็ถ๏ผๅณๅฝใๅฏ่กใๆฐ็ญ๏ผ โ ่ธ้จ X ๅ โ ่ธ้จ CT โ ๆดปๆฃ/็ป่ๅญฆ | |
| - **ๅๅ**๏ผ85% NSCLC๏ผ15% SCLC | |
| - **ๅๆ**๏ผStage I-IV๏ผๅบไบ `lung_cancer_nondiagnosis_counter` | |
| - **ๆฒป็**๏ผNSCLC ไฝฟ็จ Cisplatin + Paclitaxel โ ๆพ็ | |
| #### 2.3.2 ่ชๅฎไน NSCLC Biomarker ๆฉๅฑๆจกๅ | |
| ็ฑไบๅ็ๆจกๅไธๅซ EGFR/ALK/PD-L1 ็ญ biomarkers๏ผ้่ฆๅๅปบๆฉๅฑๅญๆจกๅใ | |
| **ๆไปถ๏ผ`data/synthea_config/modules/lung_cancer_biomarkers.json`** | |
| ๅบไบ DeepWiki ็ ็ฉถ็ Synthea ๆจกๅ็ถๆ็ฑปๅ๏ผๅฏ็จ็็ถๆ็ฑปๅๅ ๆฌ๏ผ | |
| - `Initial` โ ๆจกๅๅ ฅๅฃ | |
| - `Terminal` โ ๆจกๅๅบๅฃ | |
| - `Observation` โ ่ฎฐๅฝไธดๅบ่งๅฏๅผ๏ผ็จไบ biomarkers๏ผ | |
| - `SetAttribute` โ ่ฎพ็ฝฎๆฃ่ ๅฑๆง | |
| - `Guard` โ ๆกไปถ้จๆง | |
| - `Simple` โ ็ฎๅ่ฝฌๆข็ถๆ | |
| - `Encounter` โ ๅฐฑ่ฏ็ถๆ | |
| Biomarker ่งๅฏ็ถๆ็คบไพ็ปๆ๏ผ | |
| ```json | |
| { | |
| "name": "NSCLC Biomarker Panel", | |
| "states": { | |
| "Initial": { | |
| "type": "Initial", | |
| "conditional_transition": [ | |
| { | |
| "condition": { | |
| "condition_type": "Attribute", | |
| "attribute": "Lung Cancer Type", | |
| "operator": "==", | |
| "value": "NSCLC" | |
| }, | |
| "transition": "EGFR_Test_Encounter" | |
| }, | |
| { | |
| "transition": "Terminal" | |
| } | |
| ] | |
| }, | |
| "EGFR_Test_Encounter": { | |
| "type": "Encounter", | |
| "encounter_class": "ambulatory", | |
| "codes": [ | |
| { | |
| "system": "SNOMED-CT", | |
| "code": "185349003", | |
| "display": "Encounter for check up" | |
| } | |
| ], | |
| "direct_transition": "EGFR_Mutation_Status" | |
| }, | |
| "EGFR_Mutation_Status": { | |
| "type": "Observation", | |
| "category": "laboratory", | |
| "codes": [ | |
| { | |
| "system": "LOINC", | |
| "code": "41103-3", | |
| "display": "EGFR gene mutations found" | |
| } | |
| ], | |
| "distributed_transition": [ | |
| { | |
| "distribution": 0.15, | |
| "transition": "EGFR_Positive" | |
| }, | |
| { | |
| "distribution": 0.85, | |
| "transition": "EGFR_Negative" | |
| } | |
| ] | |
| }, | |
| "EGFR_Positive": { | |
| "type": "SetAttribute", | |
| "attribute": "egfr_status", | |
| "value": "positive", | |
| "direct_transition": "ALK_Rearrangement_Status" | |
| }, | |
| "EGFR_Negative": { | |
| "type": "SetAttribute", | |
| "attribute": "egfr_status", | |
| "value": "negative", | |
| "direct_transition": "ALK_Rearrangement_Status" | |
| }, | |
| "ALK_Rearrangement_Status": { | |
| "type": "Observation", | |
| "category": "laboratory", | |
| "codes": [ | |
| { | |
| "system": "LOINC", | |
| "code": "46264-8", | |
| "display": "ALK gene rearrangement" | |
| } | |
| ], | |
| "distributed_transition": [ | |
| { | |
| "distribution": 0.05, | |
| "transition": "ALK_Positive" | |
| }, | |
| { | |
| "distribution": 0.95, | |
| "transition": "ALK_Negative" | |
| } | |
| ] | |
| }, | |
| "ALK_Positive": { | |
| "type": "SetAttribute", | |
| "attribute": "alk_status", | |
| "value": "positive", | |
| "direct_transition": "PDL1_Expression" | |
| }, | |
| "ALK_Negative": { | |
| "type": "SetAttribute", | |
| "attribute": "alk_status", | |
| "value": "negative", | |
| "direct_transition": "PDL1_Expression" | |
| }, | |
| "PDL1_Expression": { | |
| "type": "Observation", | |
| "category": "laboratory", | |
| "codes": [ | |
| { | |
| "system": "LOINC", | |
| "code": "85147-0", | |
| "display": "PD-L1 by immune stain" | |
| } | |
| ], | |
| "distributed_transition": [ | |
| { | |
| "distribution": 0.30, | |
| "transition": "PDL1_High" | |
| }, | |
| { | |
| "distribution": 0.35, | |
| "transition": "PDL1_Low" | |
| }, | |
| { | |
| "distribution": 0.35, | |
| "transition": "PDL1_Negative" | |
| } | |
| ] | |
| }, | |
| "PDL1_High": { | |
| "type": "SetAttribute", | |
| "attribute": "pdl1_tps", | |
| "value": ">=50%", | |
| "direct_transition": "KRAS_Mutation_Status" | |
| }, | |
| "PDL1_Low": { | |
| "type": "SetAttribute", | |
| "attribute": "pdl1_tps", | |
| "value": "1-49%", | |
| "direct_transition": "KRAS_Mutation_Status" | |
| }, | |
| "PDL1_Negative": { | |
| "type": "SetAttribute", | |
| "attribute": "pdl1_tps", | |
| "value": "<1%", | |
| "direct_transition": "KRAS_Mutation_Status" | |
| }, | |
| "KRAS_Mutation_Status": { | |
| "type": "Observation", | |
| "category": "laboratory", | |
| "codes": [ | |
| { | |
| "system": "LOINC", | |
| "code": "21717-3", | |
| "display": "KRAS gene mutations found" | |
| } | |
| ], | |
| "distributed_transition": [ | |
| { | |
| "distribution": 0.25, | |
| "transition": "KRAS_Positive" | |
| }, | |
| { | |
| "distribution": 0.75, | |
| "transition": "KRAS_Negative" | |
| } | |
| ] | |
| }, | |
| "KRAS_Positive": { | |
| "type": "SetAttribute", | |
| "attribute": "kras_status", | |
| "value": "positive", | |
| "direct_transition": "Terminal" | |
| }, | |
| "KRAS_Negative": { | |
| "type": "SetAttribute", | |
| "attribute": "kras_status", | |
| "value": "negative", | |
| "direct_transition": "Terminal" | |
| }, | |
| "Terminal": { | |
| "type": "Terminal" | |
| } | |
| } | |
| } | |
| ``` | |
| **Biomarker ๆต่ก็ๅๅธ๏ผๅบไบ NSCLC ๆ็ฎ๏ผ๏ผ** | |
| | Biomarker | ้ณๆง็ | LOINC Code | ่ฏดๆ | | |
| |-----------|--------|------------|------| | |
| | EGFR mutation | ~15% | 41103-3 | ้ๅธ็ไบ่ฃๅฅณๆงๆด้ซ | | |
| | ALK rearrangement | ~5% | 46264-8 | ๅนด่ฝป้ๅธ็่ ๆดๅธธ่ง | | |
| | PD-L1 TPS>=50% | ~30% | 85147-0 | ๅ ็ซๆฒป็้็จๆ ๅ | | |
| | KRAS G12C | ~13% | 21717-3 | Sotorasib ้ถๅ | | |
| | ROS1 fusion | ~1-2% | 46265-5 | Crizotinib ้ถๅ | | |
| ### 2.4 ๆน้็ๆๅฝไปค | |
| ```bash | |
| # ็ๆ 500 ไธช NSCLC ๆฃ่ ๏ผไฝฟ็จ็งๅญ็กฎไฟๅฏ้็ฐ | |
| java -jar synthea-with-dependencies.jar \ | |
| -p 500 \ | |
| -s 42 \ | |
| -m lung_cancer \ | |
| --exporter.fhir.export=true \ | |
| --exporter.fhir_stu3.export=false \ | |
| --exporter.fhir_dstu2.export=false \ | |
| --exporter.ccda.export=false \ | |
| --exporter.csv.export=false \ | |
| --exporter.hospital.fhir.export=false \ | |
| --exporter.practitioner.fhir.export=false \ | |
| --exporter.pretty_print=true \ | |
| Massachusetts | |
| # ๅๆฐ่ฏดๆ: | |
| # -p 500 : ็ๆ 500 ไธชๆฃ่ | |
| # -s 42 : ้ๆบ็งๅญ (ๅฏ้็ฐ) | |
| # -m lung_cancer : ไป ่ฟ่ก lung_cancer ๆจกๅ | |
| # --exporter.fhir.export=true : ๅฏ็จ FHIR R4 ๅฏผๅบ | |
| # Massachusetts : ็ๆๅฐๅบ | |
| ``` | |
| **่พๅบไฝ็ฝฎ๏ผ** `./output/fhir/` ็ฎๅฝไธ๏ผๆฏไธชๆฃ่ ไธไธช JSON ๆไปถใ | |
| ### 2.5 FHIR Bundle ่พๅบๆ ผๅผ | |
| ๆฅๆบ๏ผDeepWiki `synthetichealth/synthea` ๅ ณไบ FHIR ๅฏผๅบ็ณป็ป็ๅๆใ | |
| **้กถๅฑ็ปๆ๏ผ** | |
| ```json | |
| { | |
| "resourceType": "Bundle", | |
| "type": "transaction", | |
| "entry": [ | |
| { | |
| "fullUrl": "urn:uuid:patient-uuid-here", | |
| "resource": { "resourceType": "Patient", ... }, | |
| "request": { "method": "POST", "url": "Patient" } | |
| }, | |
| { | |
| "fullUrl": "urn:uuid:condition-uuid-here", | |
| "resource": { "resourceType": "Condition", ... }, | |
| "request": { "method": "POST", "url": "Condition" } | |
| } | |
| ] | |
| } | |
| ``` | |
| **Synthea ็ๆ็ FHIR Resource ็ฑปๅ๏ผDeepWiki ็กฎ่ฎค๏ผ๏ผ** | |
| - `Patient` โ ๆฃ่ ๅบๆฌไฟกๆฏ | |
| - `Condition` โ ่ฏๆญ๏ผๅฆ NSCLC๏ผ | |
| - `Observation` โ ๅฎ้ชๅฎคๆฃๆฅๅ็ๅฝไฝๅพ | |
| - `MedicationRequest` โ ็จ่ฏๅคๆน | |
| - `Procedure` โ ๆๆฏๅๆไฝ | |
| - `DiagnosticReport` โ ่ฏๆญๆฅๅ | |
| - `DocumentReference` โ ไธดๅบๆๆกฃ๏ผ้ US Core IG ๅฏ็จ๏ผ | |
| - `Encounter` โ ๅฐฑ่ฏ่ฎฐๅฝ | |
| - `AllergyIntolerance` โ ่ฟๆๅฒ | |
| - `Immunization` โ ๅ ็ซๆฅ็ง | |
| - `CarePlan` โ ๆค็่ฎกๅ | |
| - `ImagingStudy` โ ๅฝฑๅๆฃๆฅ | |
| ### 2.6 FHIR Resource ๅฐ PatientProfile ็ๆ ๅฐ | |
| ```python | |
| # data/generate_synthetic_patients.py ไธญ็ๆ ๅฐ้ป่พ | |
| FHIR_TO_PATIENT_PROFILE_MAP = { | |
| # Patient Resource โ demographics | |
| "Patient.name": "demographics.name", | |
| "Patient.gender": "demographics.sex", | |
| "Patient.birthDate": "demographics.date_of_birth", | |
| "Patient.address.state": "demographics.state", | |
| # Condition Resource โ diagnosis | |
| "Condition[code=SNOMED:254637007]": "diagnosis.primary", # NSCLC | |
| "Condition.stage.summary": "diagnosis.stage", | |
| "Condition.bodySite": "diagnosis.histology", | |
| # Observation Resources โ biomarkers | |
| "Observation[code=LOINC:41103-3]": "biomarkers.egfr", | |
| "Observation[code=LOINC:46264-8]": "biomarkers.alk", | |
| "Observation[code=LOINC:85147-0]": "biomarkers.pdl1_tps", | |
| "Observation[code=LOINC:21717-3]": "biomarkers.kras", | |
| # Observation Resources โ labs | |
| "Observation[category=laboratory]": "labs[]", | |
| # MedicationRequest โ prior_treatments | |
| "MedicationRequest.medicationCodeableConcept": "treatments[].medication", | |
| # Procedure โ prior_treatments | |
| "Procedure.code": "treatments[].procedure", | |
| } | |
| ``` | |
| **่ฝฌๆขๅฝๆฐๆจกๅผ๏ผ** | |
| ```python | |
| import json | |
| from pathlib import Path | |
| from dataclasses import dataclass, field, asdict | |
| from typing import Optional | |
| @dataclass | |
| class Demographics: | |
| name: str = "" | |
| sex: str = "" | |
| date_of_birth: str = "" | |
| age: int = 0 | |
| state: str = "" | |
| @dataclass | |
| class Diagnosis: | |
| primary: str = "" | |
| stage: str = "" | |
| histology: str = "" | |
| diagnosis_date: str = "" | |
| @dataclass | |
| class Biomarkers: | |
| egfr: Optional[str] = None | |
| alk: Optional[str] = None | |
| pdl1_tps: Optional[str] = None | |
| kras: Optional[str] = None | |
| ros1: Optional[str] = None | |
| @dataclass | |
| class LabResult: | |
| name: str = "" | |
| value: float = 0.0 | |
| unit: str = "" | |
| date: str = "" | |
| loinc_code: str = "" | |
| @dataclass | |
| class Treatment: | |
| name: str = "" | |
| type: str = "" # "medication" | "procedure" | "radiation" | |
| start_date: str = "" | |
| end_date: Optional[str] = None | |
| @dataclass | |
| class PatientProfile: | |
| patient_id: str = "" | |
| demographics: Demographics = field(default_factory=Demographics) | |
| diagnosis: Diagnosis = field(default_factory=Diagnosis) | |
| biomarkers: Biomarkers = field(default_factory=Biomarkers) | |
| labs: list[LabResult] = field(default_factory=list) | |
| treatments: list[Treatment] = field(default_factory=list) | |
| unknowns: list[str] = field(default_factory=list) | |
| evidence_spans: list[dict] = field(default_factory=list) | |
| def parse_fhir_bundle(fhir_path: Path) -> PatientProfile: | |
| """Parse a Synthea FHIR Bundle JSON into PatientProfile.""" | |
| with open(fhir_path) as f: | |
| bundle = json.load(f) | |
| profile = PatientProfile() | |
| entries = bundle.get("entry", []) | |
| for entry in entries: | |
| resource = entry.get("resource", {}) | |
| resource_type = resource.get("resourceType") | |
| if resource_type == "Patient": | |
| _parse_patient(resource, profile) | |
| elif resource_type == "Condition": | |
| _parse_condition(resource, profile) | |
| elif resource_type == "Observation": | |
| _parse_observation(resource, profile) | |
| elif resource_type == "MedicationRequest": | |
| _parse_medication(resource, profile) | |
| elif resource_type == "Procedure": | |
| _parse_procedure(resource, profile) | |
| return profile | |
| def _parse_patient(resource: dict, profile: PatientProfile): | |
| """Extract demographics from Patient resource.""" | |
| names = resource.get("name", [{}]) | |
| if names: | |
| given = " ".join(names[0].get("given", [])) | |
| family = names[0].get("family", "") | |
| profile.demographics.name = f"{given} {family}".strip() | |
| profile.demographics.sex = resource.get("gender", "") | |
| profile.demographics.date_of_birth = resource.get("birthDate", "") | |
| profile.patient_id = resource.get("id", "") | |
| addresses = resource.get("address", [{}]) | |
| if addresses: | |
| profile.demographics.state = addresses[0].get("state", "") | |
| def _parse_condition(resource: dict, profile: PatientProfile): | |
| """Extract diagnosis from Condition resource.""" | |
| code = resource.get("code", {}) | |
| codings = code.get("coding", []) | |
| for coding in codings: | |
| # SNOMED codes for lung cancer | |
| if coding.get("code") in ["254637007", "254632001"]: | |
| profile.diagnosis.primary = coding.get("display", "") | |
| onset = resource.get("onsetDateTime", "") | |
| profile.diagnosis.diagnosis_date = onset | |
| # Extract stage if available | |
| stage_info = resource.get("stage", []) | |
| if stage_info: | |
| summary = stage_info[0].get("summary", {}) | |
| stage_codings = summary.get("coding", []) | |
| if stage_codings: | |
| profile.diagnosis.stage = stage_codings[0].get("display", "") | |
| def _parse_observation(resource: dict, profile: PatientProfile): | |
| """Extract labs and biomarkers from Observation resource.""" | |
| code = resource.get("code", {}) | |
| codings = code.get("coding", []) | |
| category_list = resource.get("category", []) | |
| is_lab = any( | |
| cat_coding.get("code") == "laboratory" | |
| for cat in category_list | |
| for cat_coding in cat.get("coding", []) | |
| ) | |
| for coding in codings: | |
| loinc = coding.get("code", "") | |
| display = coding.get("display", "") | |
| # Biomarker mappings | |
| biomarker_map = { | |
| "41103-3": "egfr", | |
| "46264-8": "alk", | |
| "85147-0": "pdl1_tps", | |
| "21717-3": "kras", | |
| "46265-5": "ros1", | |
| } | |
| if loinc in biomarker_map: | |
| value_cc = resource.get("valueCodeableConcept", {}) | |
| value_codings = value_cc.get("coding", []) | |
| value_str = value_codings[0].get("display", "") if value_codings else "" | |
| setattr(profile.biomarkers, biomarker_map[loinc], value_str) | |
| elif is_lab: | |
| value_qty = resource.get("valueQuantity", {}) | |
| lab = LabResult( | |
| name=display, | |
| value=value_qty.get("value", 0.0), | |
| unit=value_qty.get("unit", ""), | |
| date=resource.get("effectiveDateTime", ""), | |
| loinc_code=loinc, | |
| ) | |
| profile.labs.append(lab) | |
| ``` | |
| --- | |
| ## 3. ๅๆ PDF ็ๆ็ฎก็บฟ | |
| ### 3.1 ๆฆ่ฟฐ | |
| ็ฎๆ ๏ผๅฐ `PatientProfile` ่ฝฌๆขไธบ้ผ็็ไธดๅบๆๆกฃ PDF๏ผๅนถๆณจๅ ฅๅๆงๅชๅฃฐไปฅๆจกๆ็ๅฎไธ็ OCR ๅบๆฏใ | |
| **ๆๆฏๆ ๏ผ** | |
| - **ReportLab** (`pip install reportlab`) โ PDF ็ๆๅผๆ๏ผๆฏๆ `SimpleDocTemplate`ใ`Table`ใ`Paragraph` ็ญ Platypus ๆตๅผ็ปไปถ | |
| - **Augraphy** (`pip install augraphy`) โ ๆๆกฃๅพๅ้ๅ็ฎก็บฟ๏ผๆจกๆๆๅฐใไผ ็ใๆซๆๅชๅฃฐ | |
| - **Pillow** (`pip install Pillow`) โ ๅพๅๅค็ | |
| - **pdf2image** (`pip install pdf2image`) โ PDF ่ฝฌๅพๅ๏ผ็จไบๅชๅฃฐๆณจๅ ฅๅ่ฝฌๅ PDF๏ผ | |
| ### 3.2 ไธดๅบไฟกไปถๆจกๆฟ | |
| ```python | |
| # data/templates/clinical_letter.py | |
| from reportlab.lib.pagesizes import letter | |
| from reportlab.lib.units import inch | |
| from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle | |
| from reportlab.platypus import ( | |
| SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle | |
| ) | |
| from reportlab.lib import colors | |
| def generate_clinical_letter(profile: dict, output_path: str): | |
| """Generate a clinical letter PDF from PatientProfile.""" | |
| doc = SimpleDocTemplate(output_path, pagesize=letter, | |
| topMargin=1*inch, bottomMargin=1*inch) | |
| styles = getSampleStyleSheet() | |
| story = [] | |
| # Header | |
| header_style = ParagraphStyle( | |
| 'Header', parent=styles['Heading1'], fontSize=14, | |
| spaceAfter=6 | |
| ) | |
| story.append(Paragraph("Clinical Summary Letter", header_style)) | |
| story.append(Spacer(1, 12)) | |
| # Patient Info | |
| info_data = [ | |
| ["Patient Name:", profile["demographics"]["name"]], | |
| ["Date of Birth:", profile["demographics"]["date_of_birth"]], | |
| ["Sex:", profile["demographics"]["sex"]], | |
| ["MRN:", profile["patient_id"]], | |
| ] | |
| info_table = Table(info_data, colWidths=[2*inch, 4*inch]) | |
| info_table.setStyle(TableStyle([ | |
| ('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'), | |
| ('FONTNAME', (1, 0), (1, -1), 'Helvetica'), | |
| ('FONTSIZE', (0, 0), (-1, -1), 10), | |
| ('VALIGN', (0, 0), (-1, -1), 'TOP'), | |
| ])) | |
| story.append(info_table) | |
| story.append(Spacer(1, 18)) | |
| # Diagnosis Section | |
| story.append(Paragraph("Diagnosis", styles['Heading2'])) | |
| dx = profile.get("diagnosis", {}) | |
| dx_text = ( | |
| f"Primary: {dx.get('primary', 'Unknown')}. " | |
| f"Stage: {dx.get('stage', 'Unknown')}. " | |
| f"Histology: {dx.get('histology', 'Unknown')}. " | |
| f"Diagnosed: {dx.get('diagnosis_date', 'Unknown')}." | |
| ) | |
| story.append(Paragraph(dx_text, styles['Normal'])) | |
| story.append(Spacer(1, 12)) | |
| # Biomarkers Section | |
| story.append(Paragraph("Molecular Testing", styles['Heading2'])) | |
| bm = profile.get("biomarkers", {}) | |
| bm_data = [["Biomarker", "Result"]] | |
| for marker, value in bm.items(): | |
| if value is not None: | |
| bm_data.append([marker.upper(), str(value)]) | |
| if len(bm_data) > 1: | |
| bm_table = Table(bm_data, colWidths=[2.5*inch, 3.5*inch]) | |
| bm_table.setStyle(TableStyle([ | |
| ('BACKGROUND', (0, 0), (-1, 0), colors.lightgrey), | |
| ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), | |
| ('GRID', (0, 0), (-1, -1), 0.5, colors.grey), | |
| ('FONTSIZE', (0, 0), (-1, -1), 10), | |
| ])) | |
| story.append(bm_table) | |
| story.append(Spacer(1, 12)) | |
| # Treatment History | |
| story.append(Paragraph("Treatment History", styles['Heading2'])) | |
| treatments = profile.get("treatments", []) | |
| for tx in treatments: | |
| tx_text = f"- {tx['name']} ({tx['type']}): {tx.get('start_date', '')}" | |
| story.append(Paragraph(tx_text, styles['Normal'])) | |
| doc.build(story) | |
| ``` | |
| ### 3.3 ็ ็ๆฅๅๆจกๆฟ | |
| ```python | |
| # data/templates/pathology_report.py | |
| def generate_pathology_report(profile: dict, output_path: str): | |
| """Generate a pathology report PDF.""" | |
| doc = SimpleDocTemplate(output_path, pagesize=letter) | |
| styles = getSampleStyleSheet() | |
| story = [] | |
| story.append(Paragraph("SURGICAL PATHOLOGY REPORT", styles['Title'])) | |
| story.append(Spacer(1, 12)) | |
| # Specimen Info | |
| spec_data = [ | |
| ["Specimen:", "Right lung, upper lobe, wedge resection"], | |
| ["Procedure:", "CT-guided needle biopsy"], | |
| ["Date:", profile["diagnosis"]["diagnosis_date"]], | |
| ] | |
| spec_table = Table(spec_data, colWidths=[2*inch, 4*inch]) | |
| story.append(spec_table) | |
| story.append(Spacer(1, 12)) | |
| # Final Diagnosis | |
| story.append(Paragraph("FINAL DIAGNOSIS", styles['Heading2'])) | |
| story.append(Paragraph( | |
| f"Non-small cell lung carcinoma, {profile['diagnosis'].get('histology', 'adenocarcinoma')}, " | |
| f"{profile['diagnosis'].get('stage', 'Stage IIIA')}", | |
| styles['Normal'] | |
| )) | |
| # Biomarker Results | |
| story.append(Spacer(1, 12)) | |
| story.append(Paragraph("MOLECULAR/IMMUNOHISTOCHEMISTRY", styles['Heading2'])) | |
| bm = profile.get("biomarkers", {}) | |
| results = [] | |
| if bm.get("egfr"): | |
| results.append(f"EGFR mutation analysis: {bm['egfr']}") | |
| if bm.get("alk"): | |
| results.append(f"ALK rearrangement (FISH): {bm['alk']}") | |
| if bm.get("pdl1_tps"): | |
| results.append(f"PD-L1 (22C3, TPS): {bm['pdl1_tps']}") | |
| if bm.get("kras"): | |
| results.append(f"KRAS mutation analysis: {bm['kras']}") | |
| for r in results: | |
| story.append(Paragraph(r, styles['Normal'])) | |
| doc.build(story) | |
| ``` | |
| ### 3.4 ๅฎ้ชๅฎคๆฅๅๆจกๆฟ | |
| ```python | |
| # data/templates/lab_report.py | |
| def generate_lab_report(profile: dict, output_path: str): | |
| """Generate a laboratory report PDF with CBC, CMP, etc.""" | |
| doc = SimpleDocTemplate(output_path, pagesize=letter) | |
| styles = getSampleStyleSheet() | |
| story = [] | |
| story.append(Paragraph("LABORATORY REPORT", styles['Title'])) | |
| story.append(Spacer(1, 12)) | |
| # Lab Results Table | |
| lab_data = [["Test", "Result", "Unit", "Reference Range", "Date"]] | |
| for lab in profile.get("labs", []): | |
| lab_data.append([ | |
| lab["name"], str(lab["value"]), lab["unit"], | |
| "", # Reference range (can be added) | |
| lab["date"][:10] if lab["date"] else "" | |
| ]) | |
| if len(lab_data) > 1: | |
| lab_table = Table(lab_data, colWidths=[2*inch, 1*inch, 0.8*inch, 1.2*inch, 1*inch]) | |
| lab_table.setStyle(TableStyle([ | |
| ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#003366')), | |
| ('TEXTCOLOR', (0, 0), (-1, 0), colors.white), | |
| ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), | |
| ('GRID', (0, 0), (-1, -1), 0.5, colors.grey), | |
| ('FONTSIZE', (0, 0), (-1, -1), 9), | |
| ('ROWBACKGROUNDS', (0, 1), (-1, -1), [colors.white, colors.HexColor('#f0f0f0')]), | |
| ])) | |
| story.append(lab_table) | |
| doc.build(story) | |
| ``` | |
| ### 3.5 ๅชๅฃฐๆณจๅ ฅ็ญ็ฅ | |
| ```python | |
| # data/noise/noise_injector.py | |
| import random | |
| import re | |
| from pathlib import Path | |
| from PIL import Image | |
| # Augraphy ็ฎก็บฟ้ ็ฝฎ | |
| try: | |
| from augraphy import ( | |
| AugraphyPipeline, InkBleed, Letterpress, LowInkPeriodicLines, | |
| DirtyDrum, SubtleNoise, Jpeg, Brightness, BleedThrough | |
| ) | |
| AUGRAPHY_AVAILABLE = True | |
| except ImportError: | |
| AUGRAPHY_AVAILABLE = False | |
| class NoiseInjector: | |
| """ๅๆงๅชๅฃฐๆณจๅ ฅๅผๆ๏ผๆจกๆ็ๅฎไธ็ๆๆกฃ้ๅใ""" | |
| # OCR ๅธธ่ง้่ฏฏๆ ๅฐ | |
| OCR_ERROR_MAP = { | |
| "0": ["O", "o", "Q"], | |
| "1": ["l", "I", "|"], | |
| "5": ["S", "s"], | |
| "8": ["B"], | |
| "O": ["0", "Q"], | |
| "l": ["1", "I", "|"], | |
| "rn": ["m"], | |
| "cl": ["d"], | |
| "vv": ["w"], | |
| } | |
| # ๅปๅญฆ็ผฉๅๆฟๆข | |
| ABBREVIATION_MAP = { | |
| "non-small cell lung cancer": ["NSCLC", "non-small cell ca", "NSCC"], | |
| "adenocarcinoma": ["adeno", "adenoca", "adeno ca"], | |
| "squamous cell carcinoma": ["SCC", "squamous ca", "sq cell ca"], | |
| "Eastern Cooperative Oncology Group": ["ECOG"], | |
| "performance status": ["PS", "perf status"], | |
| "milligrams per deciliter": ["mg/dL", "mg/dl"], | |
| "computed tomography": ["CT", "cat scan"], | |
| } | |
| # ๅชๅฃฐ็บงๅซ้ ็ฝฎ | |
| NOISE_LEVELS = { | |
| "clean": {"ocr_rate": 0.0, "abbrev_rate": 0.0, "missing_rate": 0.0}, | |
| "mild": {"ocr_rate": 0.02, "abbrev_rate": 0.1, "missing_rate": 0.05}, | |
| "moderate": {"ocr_rate": 0.05, "abbrev_rate": 0.2, "missing_rate": 0.1}, | |
| "severe": {"ocr_rate": 0.10, "abbrev_rate": 0.3, "missing_rate": 0.2}, | |
| } | |
| def __init__(self, noise_level: str = "mild", seed: int = 42): | |
| self.config = self.NOISE_LEVELS[noise_level] | |
| self.rng = random.Random(seed) | |
| def inject_text_noise(self, text: str) -> tuple[str, list[dict]]: | |
| """Inject OCR errors and abbreviations into text. | |
| Returns (noisy_text, list_of_injected_noise_records). | |
| """ | |
| noise_records = [] | |
| chars = list(text) | |
| # OCR character substitutions | |
| i = 0 | |
| while i < len(chars): | |
| if self.rng.random() < self.config["ocr_rate"]: | |
| original = chars[i] | |
| if original in self.OCR_ERROR_MAP: | |
| replacement = self.rng.choice(self.OCR_ERROR_MAP[original]) | |
| chars[i] = replacement | |
| noise_records.append({ | |
| "type": "ocr_error", | |
| "position": i, | |
| "original": original, | |
| "replacement": replacement, | |
| }) | |
| i += 1 | |
| noisy_text = "".join(chars) | |
| # Abbreviation substitutions | |
| for full_form, abbreviations in self.ABBREVIATION_MAP.items(): | |
| if full_form in noisy_text.lower() and self.rng.random() < self.config["abbrev_rate"]: | |
| abbrev = self.rng.choice(abbreviations) | |
| noisy_text = re.sub( | |
| re.escape(full_form), abbrev, noisy_text, count=1, flags=re.IGNORECASE | |
| ) | |
| noise_records.append({ | |
| "type": "abbreviation", | |
| "original": full_form, | |
| "replacement": abbrev, | |
| }) | |
| return noisy_text, noise_records | |
| def inject_missing_values(self, profile: dict) -> tuple[dict, list[str]]: | |
| """Randomly remove fields from profile to simulate missing data. | |
| Returns (modified_profile, list_of_removed_fields). | |
| """ | |
| removed = [] | |
| removable_fields = [ | |
| ("biomarkers", "egfr"), | |
| ("biomarkers", "alk"), | |
| ("biomarkers", "pdl1_tps"), | |
| ("biomarkers", "kras"), | |
| ("biomarkers", "ros1"), | |
| ("diagnosis", "stage"), | |
| ("diagnosis", "histology"), | |
| ] | |
| for section, field_name in removable_fields: | |
| if self.rng.random() < self.config["missing_rate"]: | |
| if section in profile and field_name in profile[section]: | |
| profile[section][field_name] = None | |
| removed.append(f"{section}.{field_name}") | |
| return profile, removed | |
| def degrade_image(self, image: Image.Image) -> Image.Image: | |
| """Apply Augraphy degradation pipeline to document image.""" | |
| if not AUGRAPHY_AVAILABLE: | |
| return image | |
| import numpy as np | |
| img_array = np.array(image) | |
| pipeline = AugraphyPipeline( | |
| ink_phase=[ | |
| InkBleed(p=0.5), | |
| Letterpress(p=0.3), | |
| LowInkPeriodicLines(p=0.3), | |
| ], | |
| paper_phase=[ | |
| SubtleNoise(p=0.5), | |
| ], | |
| post_phase=[ | |
| DirtyDrum(p=0.3), | |
| Brightness(p=0.5), | |
| Jpeg(p=0.5), | |
| ], | |
| ) | |
| degraded = pipeline(img_array) | |
| return Image.fromarray(degraded) | |
| ``` | |
| --- | |
| ## 4. TREC ๅบๅ่ฏไผฐๆๅ | |
| ### 4.1 ๆฐๆฎ้ๆฆ่ฟฐ | |
| **TREC Clinical Trials Track 2021๏ผ** | |
| - ๆฅๆบ๏ผNIST ๆๆฌๆฃ็ดขไผ่ฎฎ | |
| - Topics๏ผๆฅ่ฏข๏ผ๏ผ75 ไธชๅๆๆฃ่ ๆ่ฟฐ๏ผ5-10 ๅฅๅ ฅ้ข่ฎฐๅฝ๏ผ | |
| - ๆๆกฃ้๏ผ376,000+ ไธดๅบ่ฏ้ช๏ผClinicalTrials.gov 2021 ๅนด 4 ๆๅฟซ็ ง๏ผ | |
| - Qrels๏ผ35,832 ๆก็ธๅ ณๆงๅคๆญ | |
| - ็ธๅ ณๆงๆ ็ญพ๏ผ0=ไธ็ธๅ ณ๏ผ1=ๆ้ค๏ผ2=ๅๆ ผ | |
| **TREC Clinical Trials Track 2022๏ผ** | |
| - Topics๏ผ50 ไธชๅๆๆฃ่ ๆ่ฟฐ | |
| - ไฝฟ็จ็ธๅ็ๆๆกฃ้ๅฟซ็ ง | |
| ### 4.2 ๆฐๆฎๆ ผๅผ | |
| #### Topics XML ๆ ผๅผ | |
| ```xml | |
| <topics task="2021 TREC Clinical Trials"> | |
| <topic number="1"> | |
| A 62-year-old male presents with a 3-month history of | |
| progressive dyspnea and a 20-pound weight loss. He has | |
| a 40 pack-year smoking history. CT chest reveals a 4.5cm | |
| right upper lobe mass with mediastinal lymphadenopathy. | |
| Biopsy confirms non-small cell lung cancer, adenocarcinoma. | |
| EGFR mutation testing is positive for exon 19 deletion. | |
| PD-L1 TPS is 60%. ECOG performance status is 1. | |
| </topic> | |
| <topic number="2"> | |
| ... | |
| </topic> | |
| </topics> | |
| ``` | |
| #### Qrels ๆ ผๅผ๏ผๅถ่กจ็ฌฆๅ้๏ผ | |
| ``` | |
| topic_id 0 doc_id relevance | |
| 1 0 NCT00760162 2 | |
| 1 0 NCT01234567 1 | |
| 1 0 NCT09876543 0 | |
| ``` | |
| - ๅ 1๏ผTopic ็ผๅท | |
| - ๅ 2๏ผๅบๅฎๅผ 0๏ผ่ฟญไปฃๆฌกๆฐ๏ผ | |
| - ๅ 3๏ผNCT ๆๆกฃ ID | |
| - ๅ 4๏ผ็ธๅ ณๆง๏ผ0=ไธ็ธๅ ณ๏ผ1=ๆ้ค๏ผ2=ๅๆ ผ๏ผ | |
| #### Run ๆไบคๆ ผๅผ | |
| ``` | |
| TOPIC_NO Q0 NCT_ID RANK SCORE RUN_NAME | |
| 1 Q0 NCT00760162 1 0.9999 trialpath-v1 | |
| 1 Q0 NCT01234567 2 0.9998 trialpath-v1 | |
| ``` | |
| ### 4.3 ไฝฟ็จ ir_datasets ๅ ่ฝฝๆฐๆฎ | |
| ```python | |
| # evaluation/run_trec_benchmark.py | |
| import ir_datasets | |
| def load_trec_2021(): | |
| """Load TREC CT 2021 topics and qrels via ir_datasets.""" | |
| dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2021") | |
| # ๅ ่ฝฝ topics (GenericQuery: query_id, text) | |
| topics = {} | |
| for query in dataset.queries_iter(): | |
| topics[query.query_id] = query.text | |
| # ๅ ่ฝฝ qrels (TrecQrel: query_id, doc_id, relevance, iteration) | |
| qrels = {} | |
| for qrel in dataset.qrels_iter(): | |
| if qrel.query_id not in qrels: | |
| qrels[qrel.query_id] = {} | |
| qrels[qrel.query_id][qrel.doc_id] = qrel.relevance | |
| return topics, qrels | |
| def load_trec_2022(): | |
| """Load TREC CT 2022 topics and qrels.""" | |
| dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2022") | |
| topics = {q.query_id: q.text for q in dataset.queries_iter()} | |
| qrels = {} | |
| for qrel in dataset.qrels_iter(): | |
| if qrel.query_id not in qrels: | |
| qrels[qrel.query_id] = {} | |
| qrels[qrel.query_id][qrel.doc_id] = qrel.relevance | |
| return topics, qrels | |
| def load_trial_documents(): | |
| """Load the clinical trial documents from ir_datasets.""" | |
| dataset = ir_datasets.load("clinicaltrials/2021") | |
| # ClinicalTrialsDoc: doc_id, title, condition, summary, | |
| # detailed_description, eligibility | |
| docs = {} | |
| for doc in dataset.docs_iter(): | |
| docs[doc.doc_id] = { | |
| "title": doc.title, | |
| "condition": doc.condition, | |
| "summary": doc.summary, | |
| "detailed_description": doc.detailed_description, | |
| "eligibility": doc.eligibility, | |
| } | |
| return docs | |
| ``` | |
| ### 4.4 TrialPath ่พๅบๅฐ TREC ๆ ผๅผ็ๆ ๅฐ | |
| ```python | |
| def convert_trialpath_to_trec_run( | |
| results: dict[str, list[dict]], | |
| run_name: str = "trialpath-v1" | |
| ) -> str: | |
| """Convert TrialPath matching results to TREC run format. | |
| Args: | |
| results: {topic_id: [{"nct_id": str, "score": float}, ...]} | |
| run_name: Run identifier | |
| Returns: | |
| TREC-format run string | |
| """ | |
| lines = [] | |
| for topic_id, candidates in results.items(): | |
| sorted_candidates = sorted(candidates, key=lambda x: x["score"], reverse=True) | |
| for rank, candidate in enumerate(sorted_candidates[:1000], 1): | |
| lines.append( | |
| f"{topic_id} Q0 {candidate['nct_id']} {rank} " | |
| f"{candidate['score']:.6f} {run_name}" | |
| ) | |
| return "\n".join(lines) | |
| def save_trec_run(run_str: str, output_path: str): | |
| """Save TREC run to file.""" | |
| with open(output_path, 'w') as f: | |
| f.write(run_str) | |
| ``` | |
| ### 4.5 ไฝฟ็จ ir-measures ่ฎก็ฎ่ฏไผฐๆๆ | |
| ```python | |
| # evaluation/run_trec_benchmark.py (็ปญ) | |
| import ir_measures | |
| from ir_measures import nDCG, P, Recall, AP, RR, SetP, SetR, SetF | |
| def evaluate_trec_run( | |
| qrels_path: str, | |
| run_path: str, | |
| ) -> dict: | |
| """Evaluate a TREC run using ir-measures. | |
| Target metrics: | |
| - Recall@50 >= 0.75 | |
| - NDCG@10 >= 0.60 | |
| - P@10 (informational) | |
| """ | |
| qrels = list(ir_measures.read_trec_qrels(qrels_path)) | |
| run = list(ir_measures.read_trec_run(run_path)) | |
| # ๅฎไน็ฎๆ ๆๆ | |
| measures = [ | |
| nDCG@10, # Target >= 0.60 | |
| Recall@50, # Target >= 0.75 | |
| P@10, # Precision at 10 | |
| AP, # Mean Average Precision | |
| RR, # Reciprocal Rank | |
| nDCG@20, # Additional depth | |
| Recall@100, # Extended recall | |
| ] | |
| # ่ฎก็ฎ่ๅๆๆ | |
| aggregate = ir_measures.calc_aggregate(measures, qrels, run) | |
| # ่ฎก็ฎ้ๆฅ่ฏขๆๆ | |
| per_query = {} | |
| for metric in ir_measures.iter_calc(measures, qrels, run): | |
| qid = metric.query_id | |
| if qid not in per_query: | |
| per_query[qid] = {} | |
| per_query[qid][str(metric.measure)] = metric.value | |
| return { | |
| "aggregate": {str(k): v for k, v in aggregate.items()}, | |
| "per_query": per_query, | |
| "pass_fail": { | |
| "ndcg@10": aggregate.get(nDCG@10, 0) >= 0.60, | |
| "recall@50": aggregate.get(Recall@50, 0) >= 0.75, | |
| } | |
| } | |
| def evaluate_with_eligibility_levels( | |
| qrels_path: str, | |
| run_path: str, | |
| ) -> dict: | |
| """Evaluate with TREC CT graded relevance (0=NR, 1=Excluded, 2=Eligible). | |
| Uses rel=2 for strict eligible-only evaluation. | |
| """ | |
| qrels = list(ir_measures.read_trec_qrels(qrels_path)) | |
| run = list(ir_measures.read_trec_run(run_path)) | |
| # Standard evaluation (relevance >= 1) | |
| standard_measures = [nDCG@10, Recall@50, P@10] | |
| standard = ir_measures.calc_aggregate(standard_measures, qrels, run) | |
| # Strict evaluation (only eligible = relevance 2) | |
| strict_measures = [ | |
| AP(rel=2), | |
| P(rel=2)@10, | |
| Recall(rel=2)@50, | |
| ] | |
| strict = ir_measures.calc_aggregate(strict_measures, qrels, run) | |
| return { | |
| "standard": {str(k): v for k, v in standard.items()}, | |
| "strict_eligible_only": {str(k): v for k, v in strict.items()}, | |
| } | |
| ``` | |
| ### 4.6 ไฝฟ็จ ir_datasets ็ๆฟไปฃ qrels/run ๆ ผๅผ | |
| ```python | |
| def evaluate_from_dicts( | |
| qrels_dict: dict[str, dict[str, int]], | |
| run_dict: dict[str, list[tuple[str, float]]], | |
| ) -> dict: | |
| """Evaluate using Python dict format (no files needed). | |
| Args: | |
| qrels_dict: {query_id: {doc_id: relevance}} | |
| run_dict: {query_id: [(doc_id, score), ...]} | |
| """ | |
| # Convert to ir-measures format | |
| qrels = [ | |
| ir_measures.Qrel(qid, did, rel) | |
| for qid, docs in qrels_dict.items() | |
| for did, rel in docs.items() | |
| ] | |
| run = [ | |
| ir_measures.ScoredDoc(qid, did, score) | |
| for qid, docs in run_dict.items() | |
| for did, score in docs | |
| ] | |
| measures = [nDCG@10, Recall@50, P@10, AP] | |
| aggregate = ir_measures.calc_aggregate(measures, qrels, run) | |
| return {str(k): v for k, v in aggregate.items()} | |
| ``` | |
| --- | |
| ## 5. MedGemma ๆๅ่ฏไผฐ | |
| ### 5.1 ๆ ๆณจๆฐๆฎ้่ฎพ่ฎก | |
| ```python | |
| # evaluation/extraction_eval.py | |
| from dataclasses import dataclass | |
| from typing import Optional | |
| @dataclass | |
| class AnnotatedField: | |
| """A single annotated field with ground truth and extraction result.""" | |
| field_name: str # e.g., "biomarkers.egfr" | |
| ground_truth: Optional[str] # From Synthea profile (gold standard) | |
| extracted: Optional[str] # From MedGemma extraction | |
| evidence_span: Optional[str] # Text span in source document | |
| source_page: Optional[int] # Page number in PDF | |
| @dataclass | |
| class ExtractionAnnotation: | |
| """Complete annotation for one patient's extraction.""" | |
| patient_id: str | |
| fields: list[AnnotatedField] | |
| noise_level: str # "clean", "mild", "moderate", "severe" | |
| document_type: str # "clinical_letter", "pathology_report", etc. | |
| ``` | |
| **ๆ ๆณจๆฐๆฎ้็ปๆ๏ผ** | |
| ```json | |
| { | |
| "patient_id": "synth-001", | |
| "noise_level": "mild", | |
| "document_type": "clinical_letter", | |
| "fields": [ | |
| { | |
| "field_name": "demographics.name", | |
| "ground_truth": "John Smith", | |
| "extracted": "John Smith", | |
| "correct": true | |
| }, | |
| { | |
| "field_name": "diagnosis.stage", | |
| "ground_truth": "Stage IIIA", | |
| "extracted": "Stage 3A", | |
| "correct": true, | |
| "note": "Equivalent representation" | |
| }, | |
| { | |
| "field_name": "biomarkers.egfr", | |
| "ground_truth": "Exon 19 deletion", | |
| "extracted": "EGFR positive", | |
| "correct": false, | |
| "note": "Partial extraction - missing specific mutation" | |
| } | |
| ] | |
| } | |
| ``` | |
| ### 5.2 ๅญๆฎต็บง F1 ่ฎก็ฎ | |
| ```python | |
| # evaluation/extraction_eval.py | |
| from sklearn.metrics import ( | |
| f1_score, precision_score, recall_score, | |
| classification_report, confusion_matrix | |
| ) | |
| import numpy as np | |
| # ๅฎไนๆๆๅฏๆๅๅญๆฎต | |
| EXTRACTION_FIELDS = [ | |
| "demographics.name", | |
| "demographics.sex", | |
| "demographics.date_of_birth", | |
| "demographics.age", | |
| "diagnosis.primary", | |
| "diagnosis.stage", | |
| "diagnosis.histology", | |
| "biomarkers.egfr", | |
| "biomarkers.alk", | |
| "biomarkers.pdl1_tps", | |
| "biomarkers.kras", | |
| "biomarkers.ros1", | |
| "labs.wbc", | |
| "labs.hemoglobin", | |
| "labs.platelets", | |
| "labs.creatinine", | |
| "labs.alt", | |
| "labs.ast", | |
| "treatments.current_regimen", | |
| "performance_status.ecog", | |
| ] | |
| def compute_field_level_f1( | |
| annotations: list[dict], | |
| ) -> dict: | |
| """Compute field-level F1, precision, recall. | |
| For each field: | |
| - TP: ground_truth exists AND extracted matches | |
| - FP: extracted exists BUT ground_truth is None or mismatch | |
| - FN: ground_truth exists BUT extracted is None or mismatch | |
| Args: | |
| annotations: List of patient annotation dicts | |
| Returns: | |
| Per-field and aggregate metrics | |
| """ | |
| field_metrics = {} | |
| for field_name in EXTRACTION_FIELDS: | |
| y_true = [] # 1 if field has ground truth value | |
| y_pred = [] # 1 if field was correctly extracted | |
| for ann in annotations: | |
| fields = {f["field_name"]: f for f in ann["fields"]} | |
| if field_name in fields: | |
| f = fields[field_name] | |
| has_gt = f["ground_truth"] is not None | |
| is_correct = f.get("correct", False) | |
| y_true.append(1 if has_gt else 0) | |
| y_pred.append(1 if is_correct else 0) | |
| if len(y_true) > 0: | |
| precision = precision_score(y_true, y_pred, zero_division=0) | |
| recall = recall_score(y_true, y_pred, zero_division=0) | |
| f1 = f1_score(y_true, y_pred, zero_division=0) | |
| field_metrics[field_name] = { | |
| "precision": round(precision, 4), | |
| "recall": round(recall, 4), | |
| "f1": round(f1, 4), | |
| "support": sum(y_true), | |
| } | |
| # Aggregate metrics | |
| all_y_true = [] | |
| all_y_pred = [] | |
| for ann in annotations: | |
| for f in ann["fields"]: | |
| has_gt = f["ground_truth"] is not None | |
| is_correct = f.get("correct", False) | |
| all_y_true.append(1 if has_gt else 0) | |
| all_y_pred.append(1 if is_correct else 0) | |
| micro_f1 = f1_score(all_y_true, all_y_pred, zero_division=0) | |
| macro_f1 = np.mean([m["f1"] for m in field_metrics.values()]) | |
| return { | |
| "per_field": field_metrics, | |
| "micro_f1": round(micro_f1, 4), | |
| "macro_f1": round(macro_f1, 4), | |
| "total_fields": len(all_y_true), | |
| "pass": micro_f1 >= 0.85, # Target: F1 >= 0.85 | |
| } | |
| def compute_extraction_report(annotations: list[dict]) -> str: | |
| """Generate a scikit-learn classification_report style output.""" | |
| all_y_true = [] | |
| all_y_pred = [] | |
| labels = [] | |
| for field_name in EXTRACTION_FIELDS: | |
| for ann in annotations: | |
| fields = {f["field_name"]: f for f in ann["fields"]} | |
| if field_name in fields: | |
| f = fields[field_name] | |
| has_gt = f["ground_truth"] is not None | |
| is_correct = f.get("correct", False) | |
| all_y_true.append(1 if has_gt else 0) | |
| all_y_pred.append(1 if is_correct else 0) | |
| return classification_report( | |
| all_y_true, all_y_pred, | |
| target_names=["absent", "present/correct"], | |
| digits=4, | |
| ) | |
| def compare_with_baseline( | |
| medgemma_annotations: list[dict], | |
| gemini_only_annotations: list[dict], | |
| ) -> dict: | |
| """Compare MedGemma extraction vs Gemini-only baseline.""" | |
| medgemma_metrics = compute_field_level_f1(medgemma_annotations) | |
| gemini_metrics = compute_field_level_f1(gemini_only_annotations) | |
| comparison = {} | |
| for field_name in EXTRACTION_FIELDS: | |
| mg = medgemma_metrics["per_field"].get(field_name, {}) | |
| gm = gemini_metrics["per_field"].get(field_name, {}) | |
| comparison[field_name] = { | |
| "medgemma_f1": mg.get("f1", 0), | |
| "gemini_f1": gm.get("f1", 0), | |
| "delta": round(mg.get("f1", 0) - gm.get("f1", 0), 4), | |
| } | |
| return { | |
| "per_field_comparison": comparison, | |
| "medgemma_overall_f1": medgemma_metrics["micro_f1"], | |
| "gemini_overall_f1": gemini_metrics["micro_f1"], | |
| "improvement": round( | |
| medgemma_metrics["micro_f1"] - gemini_metrics["micro_f1"], 4 | |
| ), | |
| } | |
| ``` | |
| ### 5.3 ๅชๅฃฐ็บงๅซๅฏนๆๅๆง่ฝ็ๅฝฑๅๅๆ | |
| ```python | |
| def analyze_noise_impact(annotations: list[dict]) -> dict: | |
| """Analyze how noise level affects extraction F1.""" | |
| by_noise = {} | |
| for ann in annotations: | |
| level = ann["noise_level"] | |
| if level not in by_noise: | |
| by_noise[level] = [] | |
| by_noise[level].append(ann) | |
| results = {} | |
| for level, level_anns in by_noise.items(): | |
| metrics = compute_field_level_f1(level_anns) | |
| results[level] = { | |
| "micro_f1": metrics["micro_f1"], | |
| "macro_f1": metrics["macro_f1"], | |
| "n_patients": len(level_anns), | |
| } | |
| return results | |
| ``` | |
| --- | |
| ## 6. ็ซฏๅฐ็ซฏ่ฏไผฐ็ฎก็บฟ | |
| ### 6.1 Criterion Decision Accuracy | |
| ```python | |
| # evaluation/criterion_eval.py | |
| def compute_criterion_accuracy( | |
| predictions: list[dict], | |
| ground_truth: list[dict], | |
| ) -> dict: | |
| """Compute criterion-level decision accuracy. | |
| Each prediction/ground_truth entry: | |
| { | |
| "patient_id": str, | |
| "trial_id": str, | |
| "criteria": [ | |
| {"criterion_id": str, "decision": "met"|"not_met"|"unknown", | |
| "evidence": str} | |
| ] | |
| } | |
| Target: >= 0.85 | |
| """ | |
| total = 0 | |
| correct = 0 | |
| by_decision_type = {"met": {"tp": 0, "total": 0}, | |
| "not_met": {"tp": 0, "total": 0}, | |
| "unknown": {"tp": 0, "total": 0}} | |
| for pred, gt in zip(predictions, ground_truth): | |
| assert pred["patient_id"] == gt["patient_id"] | |
| assert pred["trial_id"] == gt["trial_id"] | |
| gt_map = {c["criterion_id"]: c["decision"] for c in gt["criteria"]} | |
| for criterion in pred["criteria"]: | |
| cid = criterion["criterion_id"] | |
| if cid in gt_map: | |
| total += 1 | |
| gt_decision = gt_map[cid] | |
| pred_decision = criterion["decision"] | |
| by_decision_type[gt_decision]["total"] += 1 | |
| if pred_decision == gt_decision: | |
| correct += 1 | |
| by_decision_type[gt_decision]["tp"] += 1 | |
| accuracy = correct / total if total > 0 else 0.0 | |
| return { | |
| "overall_accuracy": round(accuracy, 4), | |
| "total_criteria": total, | |
| "correct": correct, | |
| "pass": accuracy >= 0.85, | |
| "by_decision_type": { | |
| k: { | |
| "accuracy": round(v["tp"] / v["total"], 4) if v["total"] > 0 else 0, | |
| "support": v["total"], | |
| } | |
| for k, v in by_decision_type.items() | |
| }, | |
| } | |
| ``` | |
| ### 6.2 ๅปถ่ฟๅบๅๆต่ฏ | |
| ```python | |
| # evaluation/latency_cost_tracker.py | |
| import time | |
| import json | |
| from dataclasses import dataclass, field, asdict | |
| from typing import Optional | |
| from contextlib import contextmanager | |
| @dataclass | |
| class APICallRecord: | |
| """Record of a single API call.""" | |
| service: str # "medgemma", "gemini", "clinicaltrials_mcp" | |
| operation: str # "extract", "search", "evaluate_criterion" | |
| latency_ms: float | |
| input_tokens: int = 0 | |
| output_tokens: int = 0 | |
| cost_usd: float = 0.0 | |
| timestamp: str = "" | |
| @dataclass | |
| class SessionMetrics: | |
| """Aggregate metrics for a patient matching session.""" | |
| patient_id: str | |
| total_latency_ms: float = 0.0 | |
| total_cost_usd: float = 0.0 | |
| api_calls: list[APICallRecord] = field(default_factory=list) | |
| @property | |
| def total_latency_s(self) -> float: | |
| return self.total_latency_ms / 1000.0 | |
| @property | |
| def pass_latency(self) -> bool: | |
| """Target: < 15s per session.""" | |
| return self.total_latency_s < 15.0 | |
| @property | |
| def pass_cost(self) -> bool: | |
| """Target: < $0.50 per session.""" | |
| return self.total_cost_usd < 0.50 | |
| class LatencyCostTracker: | |
| """Track latency and cost across API calls.""" | |
| # Pricing per 1M tokens (approximate) | |
| PRICING = { | |
| "medgemma": {"input": 0.0, "output": 0.0}, # Self-hosted | |
| "gemini": {"input": 1.25, "output": 5.00}, # Gemini Pro | |
| "clinicaltrials_mcp": {"input": 0.0, "output": 0.0}, # Free API | |
| } | |
| def __init__(self): | |
| self.sessions: list[SessionMetrics] = [] | |
| self._current_session: Optional[SessionMetrics] = None | |
| def start_session(self, patient_id: str): | |
| self._current_session = SessionMetrics(patient_id=patient_id) | |
| def end_session(self) -> SessionMetrics: | |
| session = self._current_session | |
| if session: | |
| session.total_latency_ms = sum(c.latency_ms for c in session.api_calls) | |
| session.total_cost_usd = sum(c.cost_usd for c in session.api_calls) | |
| self.sessions.append(session) | |
| self._current_session = None | |
| return session | |
| @contextmanager | |
| def track_call(self, service: str, operation: str): | |
| """Context manager to track an API call.""" | |
| start = time.monotonic() | |
| record = APICallRecord(service=service, operation=operation, latency_ms=0) | |
| try: | |
| yield record | |
| finally: | |
| record.latency_ms = (time.monotonic() - start) * 1000 | |
| # Compute cost | |
| pricing = self.PRICING.get(service, {"input": 0, "output": 0}) | |
| record.cost_usd = ( | |
| record.input_tokens * pricing["input"] / 1_000_000 | |
| + record.output_tokens * pricing["output"] / 1_000_000 | |
| ) | |
| if self._current_session: | |
| self._current_session.api_calls.append(record) | |
| def summary(self) -> dict: | |
| """Generate aggregate summary across all sessions.""" | |
| if not self.sessions: | |
| return {} | |
| latencies = [s.total_latency_s for s in self.sessions] | |
| costs = [s.total_cost_usd for s in self.sessions] | |
| return { | |
| "n_sessions": len(self.sessions), | |
| "latency": { | |
| "mean_s": round(sum(latencies) / len(latencies), 2), | |
| "p50_s": round(sorted(latencies)[len(latencies) // 2], 2), | |
| "p95_s": round(sorted(latencies)[int(len(latencies) * 0.95)], 2), | |
| "max_s": round(max(latencies), 2), | |
| "pass_rate": round( | |
| sum(1 for s in self.sessions if s.pass_latency) / len(self.sessions), 4 | |
| ), | |
| }, | |
| "cost": { | |
| "mean_usd": round(sum(costs) / len(costs), 4), | |
| "total_usd": round(sum(costs), 4), | |
| "max_usd": round(max(costs), 4), | |
| "pass_rate": round( | |
| sum(1 for s in self.sessions if s.pass_cost) / len(self.sessions), 4 | |
| ), | |
| }, | |
| "targets": { | |
| "latency_pass": all(s.pass_latency for s in self.sessions), | |
| "cost_pass": all(s.pass_cost for s in self.sessions), | |
| }, | |
| } | |
| ``` | |
| --- | |
| ## 7. TDD ๆต่ฏ็จไพ | |
| ### 7.1 Synthea ๆฐๆฎ้ช่ฏๆต่ฏ | |
| ```python | |
| # tests/test_synthea_data.py | |
| import pytest | |
| import json | |
| from pathlib import Path | |
| # ้ขๆ็ FHIR Resource ็ฑปๅ | |
| REQUIRED_RESOURCE_TYPES = {"Patient", "Condition", "Observation", "Encounter"} | |
| class TestSyntheaDataValidation: | |
| """Validate Synthea FHIR output for TrialPath requirements.""" | |
| def test_fhir_bundle_is_valid_json(self, fhir_file): | |
| """Bundle must be valid JSON.""" | |
| with open(fhir_file) as f: | |
| data = json.load(f) | |
| assert data["resourceType"] == "Bundle" | |
| assert "entry" in data | |
| def test_bundle_contains_required_resources(self, fhir_file): | |
| """Bundle must contain Patient, Condition, Observation, Encounter.""" | |
| with open(fhir_file) as f: | |
| bundle = json.load(f) | |
| resource_types = { | |
| e["resource"]["resourceType"] for e in bundle["entry"] | |
| } | |
| for rt in REQUIRED_RESOURCE_TYPES: | |
| assert rt in resource_types, f"Missing {rt} resource" | |
| def test_patient_has_demographics(self, fhir_file): | |
| """Patient resource must have name, gender, birthDate.""" | |
| with open(fhir_file) as f: | |
| bundle = json.load(f) | |
| patients = [ | |
| e["resource"] for e in bundle["entry"] | |
| if e["resource"]["resourceType"] == "Patient" | |
| ] | |
| assert len(patients) == 1 | |
| patient = patients[0] | |
| assert "name" in patient | |
| assert "gender" in patient | |
| assert "birthDate" in patient | |
| def test_lung_cancer_condition_present(self, fhir_file): | |
| """At least one Condition must be NSCLC or lung cancer.""" | |
| with open(fhir_file) as f: | |
| bundle = json.load(f) | |
| conditions = [ | |
| e["resource"] for e in bundle["entry"] | |
| if e["resource"]["resourceType"] == "Condition" | |
| ] | |
| lung_cancer_codes = {"254637007", "254632001", "162573006"} | |
| has_lung_cancer = False | |
| for cond in conditions: | |
| codings = cond.get("code", {}).get("coding", []) | |
| for c in codings: | |
| if c.get("code") in lung_cancer_codes: | |
| has_lung_cancer = True | |
| assert has_lung_cancer, "No lung cancer Condition found" | |
| def test_patient_profile_conversion(self, fhir_file): | |
| """FHIR Bundle must convert to valid PatientProfile.""" | |
| profile = parse_fhir_bundle(Path(fhir_file)) | |
| assert profile.patient_id != "" | |
| assert profile.demographics.name != "" | |
| assert profile.demographics.sex in ("male", "female") | |
| assert profile.diagnosis.primary != "" | |
| def test_batch_generation_produces_500_patients(self, output_dir): | |
| """Batch generation must produce at least 500 FHIR files.""" | |
| fhir_files = list(Path(output_dir).glob("*.json")) | |
| assert len(fhir_files) >= 500 | |
| def test_nsclc_ratio(self, all_profiles): | |
| """~85% of lung cancer patients should be NSCLC.""" | |
| nsclc_count = sum( | |
| 1 for p in all_profiles | |
| if "non-small cell" in p.diagnosis.primary.lower() | |
| or "nsclc" in p.diagnosis.primary.lower() | |
| ) | |
| ratio = nsclc_count / len(all_profiles) | |
| assert 0.70 <= ratio <= 0.95, f"NSCLC ratio {ratio} outside expected range" | |
| ``` | |
| ### 7.2 PDF ็ๆๆญฃ็กฎๆงๆต่ฏ | |
| ```python | |
| # tests/test_pdf_generation.py | |
| import pytest | |
| from pathlib import Path | |
| from data.templates.clinical_letter import generate_clinical_letter | |
| from data.templates.pathology_report import generate_pathology_report | |
| from data.templates.lab_report import generate_lab_report | |
| class TestPDFGeneration: | |
| """Test that PDF generation produces valid documents.""" | |
| SAMPLE_PROFILE = { | |
| "patient_id": "test-001", | |
| "demographics": { | |
| "name": "Jane Doe", | |
| "sex": "female", | |
| "date_of_birth": "1960-05-15", | |
| }, | |
| "diagnosis": { | |
| "primary": "Non-small cell lung cancer, adenocarcinoma", | |
| "stage": "Stage IIIA", | |
| "histology": "adenocarcinoma", | |
| "diagnosis_date": "2024-01-15", | |
| }, | |
| "biomarkers": { | |
| "egfr": "Exon 19 deletion", | |
| "alk": "Negative", | |
| "pdl1_tps": "60%", | |
| "kras": None, | |
| }, | |
| "labs": [ | |
| {"name": "WBC", "value": 7.2, "unit": "10*3/uL", "date": "2024-01-10", "loinc_code": "6690-2"}, | |
| {"name": "Hemoglobin", "value": 12.5, "unit": "g/dL", "date": "2024-01-10", "loinc_code": "718-7"}, | |
| ], | |
| "treatments": [ | |
| {"name": "Cisplatin", "type": "medication", "start_date": "2024-02-01"}, | |
| ], | |
| } | |
| def test_clinical_letter_generates_pdf(self, tmp_path): | |
| """Clinical letter must generate a non-empty PDF file.""" | |
| output = tmp_path / "letter.pdf" | |
| generate_clinical_letter(self.SAMPLE_PROFILE, str(output)) | |
| assert output.exists() | |
| assert output.stat().st_size > 0 | |
| def test_pathology_report_generates_pdf(self, tmp_path): | |
| """Pathology report must generate a non-empty PDF file.""" | |
| output = tmp_path / "pathology.pdf" | |
| generate_pathology_report(self.SAMPLE_PROFILE, str(output)) | |
| assert output.exists() | |
| assert output.stat().st_size > 0 | |
| def test_lab_report_generates_pdf(self, tmp_path): | |
| """Lab report must generate a non-empty PDF file.""" | |
| output = tmp_path / "lab.pdf" | |
| generate_lab_report(self.SAMPLE_PROFILE, str(output)) | |
| assert output.exists() | |
| assert output.stat().st_size > 0 | |
| def test_pdf_contains_patient_name(self, tmp_path): | |
| """Generated PDF must contain patient name (OCR-verifiable).""" | |
| output = tmp_path / "letter.pdf" | |
| generate_clinical_letter(self.SAMPLE_PROFILE, str(output)) | |
| # Read PDF text (using pdfplumber or PyPDF2) | |
| import pdfplumber | |
| with pdfplumber.open(str(output)) as pdf: | |
| text = "" | |
| for page in pdf.pages: | |
| text += page.extract_text() or "" | |
| assert "Jane Doe" in text | |
| def test_pdf_contains_biomarkers(self, tmp_path): | |
| """Generated PDF must contain biomarker results.""" | |
| output = tmp_path / "pathology.pdf" | |
| generate_pathology_report(self.SAMPLE_PROFILE, str(output)) | |
| import pdfplumber | |
| with pdfplumber.open(str(output)) as pdf: | |
| text = "" | |
| for page in pdf.pages: | |
| text += page.extract_text() or "" | |
| assert "EGFR" in text | |
| assert "Exon 19" in text or "positive" in text.lower() | |
| def test_missing_biomarker_handled_gracefully(self, tmp_path): | |
| """PDF generation should not crash when biomarkers are None.""" | |
| profile = self.SAMPLE_PROFILE.copy() | |
| profile["biomarkers"] = { | |
| "egfr": None, "alk": None, "pdl1_tps": None, "kras": None | |
| } | |
| output = tmp_path / "letter.pdf" | |
| generate_clinical_letter(profile, str(output)) | |
| assert output.exists() | |
| ``` | |
| ### 7.3 ๅชๅฃฐๆณจๅ ฅๆๆ้ช่ฏๆต่ฏ | |
| ```python | |
| # tests/test_noise_injection.py | |
| import pytest | |
| from data.noise.noise_injector import NoiseInjector | |
| class TestNoiseInjection: | |
| """Test noise injection produces expected results.""" | |
| def test_clean_noise_no_changes(self): | |
| """Clean level should produce no changes.""" | |
| injector = NoiseInjector(noise_level="clean", seed=42) | |
| text = "Patient has EGFR mutation positive" | |
| noisy, records = injector.inject_text_noise(text) | |
| assert noisy == text | |
| assert len(records) == 0 | |
| def test_mild_noise_produces_some_changes(self): | |
| """Mild noise should produce some but limited changes.""" | |
| injector = NoiseInjector(noise_level="mild", seed=42) | |
| # Use longer text to increase chance of noise | |
| text = "The patient is a 65 year old male with stage IIIA " * 10 | |
| noisy, records = injector.inject_text_noise(text) | |
| # Should have some changes but not too many | |
| assert len(records) >= 0 # May or may not have changes depending on seed | |
| def test_severe_noise_produces_many_changes(self): | |
| """Severe noise should produce noticeable changes.""" | |
| injector = NoiseInjector(noise_level="severe", seed=42) | |
| text = "The 50 year old patient has stage 1 NSCLC " * 20 | |
| noisy, records = injector.inject_text_noise(text) | |
| assert noisy != text # Should differ from original | |
| assert len(records) > 0 | |
| def test_ocr_error_types_are_valid(self): | |
| """OCR errors should only substitute known character pairs.""" | |
| injector = NoiseInjector(noise_level="severe", seed=42) | |
| text = "0123456789 OIBS" * 10 | |
| _, records = injector.inject_text_noise(text) | |
| for r in records: | |
| if r["type"] == "ocr_error": | |
| assert r["original"] in NoiseInjector.OCR_ERROR_MAP | |
| assert r["replacement"] in NoiseInjector.OCR_ERROR_MAP[r["original"]] | |
| def test_missing_value_injection(self): | |
| """Missing value injection should remove some fields.""" | |
| injector = NoiseInjector(noise_level="moderate", seed=42) | |
| profile = { | |
| "biomarkers": {"egfr": "positive", "alk": "negative", | |
| "pdl1_tps": "60%", "kras": "negative", "ros1": "negative"}, | |
| "diagnosis": {"stage": "IIIA", "histology": "adenocarcinoma"}, | |
| } | |
| modified, removed = injector.inject_missing_values(profile) | |
| # At 10% rate with 7 fields, expect 0-3 removals | |
| assert len(removed) <= 7 | |
| for field_path in removed: | |
| section, field_name = field_path.split(".") | |
| assert modified[section][field_name] is None | |
| def test_noise_is_deterministic_with_seed(self): | |
| """Same seed should produce identical results.""" | |
| text = "Patient has stage IIIA non-small cell lung cancer" | |
| inj1 = NoiseInjector(noise_level="moderate", seed=123) | |
| inj2 = NoiseInjector(noise_level="moderate", seed=123) | |
| noisy1, _ = inj1.inject_text_noise(text) | |
| noisy2, _ = inj2.inject_text_noise(text) | |
| assert noisy1 == noisy2 | |
| def test_different_seeds_produce_different_results(self): | |
| """Different seeds should generally produce different noise.""" | |
| text = "The 50 year old patient has 10 biomarker tests 0 1 5 8" * 20 | |
| inj1 = NoiseInjector(noise_level="severe", seed=1) | |
| inj2 = NoiseInjector(noise_level="severe", seed=999) | |
| noisy1, _ = inj1.inject_text_noise(text) | |
| noisy2, _ = inj2.inject_text_noise(text) | |
| # With severe noise on long text, different seeds should differ | |
| assert noisy1 != noisy2 | |
| ``` | |
| ### 7.4 TREC ่ฏไผฐ่ฎก็ฎๆต่ฏ | |
| ```python | |
| # tests/test_trec_evaluation.py | |
| import pytest | |
| import ir_measures | |
| from ir_measures import nDCG, Recall, P, AP | |
| class TestTRECEvaluation: | |
| """Test TREC evaluation metric computation.""" | |
| @pytest.fixture | |
| def sample_qrels(self): | |
| """Sample qrels with known ground truth.""" | |
| return [ | |
| ir_measures.Qrel("q1", "d1", 2), # eligible | |
| ir_measures.Qrel("q1", "d2", 1), # excluded | |
| ir_measures.Qrel("q1", "d3", 0), # not relevant | |
| ir_measures.Qrel("q1", "d4", 2), # eligible | |
| ir_measures.Qrel("q1", "d5", 0), # not relevant | |
| ] | |
| @pytest.fixture | |
| def perfect_run(self): | |
| """Run that ranks all relevant docs at top.""" | |
| return [ | |
| ir_measures.ScoredDoc("q1", "d1", 1.0), | |
| ir_measures.ScoredDoc("q1", "d4", 0.9), | |
| ir_measures.ScoredDoc("q1", "d2", 0.8), | |
| ir_measures.ScoredDoc("q1", "d3", 0.1), | |
| ir_measures.ScoredDoc("q1", "d5", 0.05), | |
| ] | |
| @pytest.fixture | |
| def worst_run(self): | |
| """Run that ranks relevant docs at bottom.""" | |
| return [ | |
| ir_measures.ScoredDoc("q1", "d3", 1.0), | |
| ir_measures.ScoredDoc("q1", "d5", 0.9), | |
| ir_measures.ScoredDoc("q1", "d2", 0.5), | |
| ir_measures.ScoredDoc("q1", "d4", 0.2), | |
| ir_measures.ScoredDoc("q1", "d1", 0.1), | |
| ] | |
| def test_perfect_ndcg_at_10(self, sample_qrels, perfect_run): | |
| """Perfect ranking should yield NDCG@10 = 1.0.""" | |
| result = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run) | |
| assert result[nDCG@10] == pytest.approx(1.0, abs=0.01) | |
| def test_worst_ndcg_lower(self, sample_qrels, perfect_run, worst_run): | |
| """Worst ranking should yield lower NDCG than perfect.""" | |
| perfect = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run) | |
| worst = ir_measures.calc_aggregate([nDCG@10], sample_qrels, worst_run) | |
| assert worst[nDCG@10] < perfect[nDCG@10] | |
| def test_recall_at_50_perfect(self, sample_qrels, perfect_run): | |
| """Perfect run should retrieve all relevant docs.""" | |
| result = ir_measures.calc_aggregate([Recall@50], sample_qrels, perfect_run) | |
| assert result[Recall@50] == pytest.approx(1.0, abs=0.01) | |
| def test_empty_run_yields_zero(self, sample_qrels): | |
| """Empty run should yield 0 for all metrics.""" | |
| empty_run = [] | |
| result = ir_measures.calc_aggregate( | |
| [nDCG@10, Recall@50, P@10], sample_qrels, empty_run | |
| ) | |
| assert result[nDCG@10] == 0.0 | |
| assert result[Recall@50] == 0.0 | |
| assert result[P@10] == 0.0 | |
| def test_per_query_results(self, sample_qrels, perfect_run): | |
| """Per-query results should return one entry per query.""" | |
| results = list(ir_measures.iter_calc( | |
| [nDCG@10], sample_qrels, perfect_run | |
| )) | |
| assert len(results) == 1 # Only q1 | |
| assert results[0].query_id == "q1" | |
| def test_trec_run_format_conversion(self): | |
| """Test TrialPath results to TREC format conversion.""" | |
| results = { | |
| "1": [ | |
| {"nct_id": "NCT001", "score": 0.95}, | |
| {"nct_id": "NCT002", "score": 0.80}, | |
| ] | |
| } | |
| run_str = convert_trialpath_to_trec_run(results, "test-run") | |
| lines = run_str.strip().split("\n") | |
| assert len(lines) == 2 | |
| assert "NCT001" in lines[0] | |
| assert "1" == lines[0].split()[3] # rank 1 | |
| assert "2" == lines[1].split()[3] # rank 2 | |
| def test_graded_relevance_evaluation(self, sample_qrels, perfect_run): | |
| """Test strict eligible-only evaluation (rel=2).""" | |
| strict = ir_measures.calc_aggregate( | |
| [AP(rel=2)], sample_qrels, perfect_run | |
| ) | |
| assert strict[AP(rel=2)] > 0.0 | |
| def test_qrels_dict_format(self): | |
| """Test evaluation from dict format.""" | |
| qrels = {"q1": {"d1": 2, "d2": 1, "d3": 0}} | |
| run = [ | |
| ir_measures.ScoredDoc("q1", "d1", 1.0), | |
| ir_measures.ScoredDoc("q1", "d2", 0.5), | |
| ir_measures.ScoredDoc("q1", "d3", 0.1), | |
| ] | |
| result = ir_measures.calc_aggregate([nDCG@10], qrels, run) | |
| assert nDCG@10 in result | |
| ``` | |
| ### 7.5 F1 ่ฎก็ฎๆต่ฏ | |
| ```python | |
| # tests/test_extraction_f1.py | |
| import pytest | |
| from evaluation.extraction_eval import compute_field_level_f1 | |
| class TestExtractionF1: | |
| """Test F1 computation for field-level extraction.""" | |
| def test_perfect_extraction(self): | |
| """All fields correctly extracted should yield F1=1.0.""" | |
| annotations = [{ | |
| "patient_id": "p1", | |
| "noise_level": "clean", | |
| "document_type": "clinical_letter", | |
| "fields": [ | |
| {"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True}, | |
| {"field_name": "demographics.sex", "ground_truth": "male", "extracted": "male", "correct": True}, | |
| {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "NSCLC", "correct": True}, | |
| {"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True}, | |
| ] | |
| }] | |
| result = compute_field_level_f1(annotations) | |
| assert result["micro_f1"] == 1.0 | |
| assert result["pass"] is True | |
| def test_zero_extraction(self): | |
| """No correct extractions should yield F1=0.""" | |
| annotations = [{ | |
| "patient_id": "p1", | |
| "noise_level": "clean", | |
| "document_type": "clinical_letter", | |
| "fields": [ | |
| {"field_name": "demographics.name", "ground_truth": "John", "extracted": "Jane", "correct": False}, | |
| {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": None, "correct": False}, | |
| ] | |
| }] | |
| result = compute_field_level_f1(annotations) | |
| assert result["micro_f1"] == 0.0 | |
| assert result["pass"] is False | |
| def test_partial_extraction(self): | |
| """Partial extraction should yield 0 < F1 < 1.""" | |
| annotations = [{ | |
| "patient_id": "p1", | |
| "noise_level": "mild", | |
| "document_type": "clinical_letter", | |
| "fields": [ | |
| {"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True}, | |
| {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "lung ca", "correct": False}, | |
| {"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True}, | |
| {"field_name": "biomarkers.alk", "ground_truth": "negative", "extracted": None, "correct": False}, | |
| ] | |
| }] | |
| result = compute_field_level_f1(annotations) | |
| assert 0.0 < result["micro_f1"] < 1.0 | |
| def test_f1_threshold_boundary(self): | |
| """F1 exactly at 0.85 should pass.""" | |
| # Create annotations that produce exactly 0.85 F1 | |
| fields = [] | |
| for i in range(85): | |
| fields.append({"field_name": f"field_{i}", "ground_truth": "val", "extracted": "val", "correct": True}) | |
| for i in range(15): | |
| fields.append({"field_name": f"field_miss_{i}", "ground_truth": "val", "extracted": None, "correct": False}) | |
| annotations = [{"patient_id": "p1", "noise_level": "clean", | |
| "document_type": "test", "fields": fields}] | |
| result = compute_field_level_f1(annotations) | |
| # With 85/100 correct, F1 should be ~0.85 | |
| assert result["pass"] is True | |
| def test_empty_annotations(self): | |
| """Empty annotations should not crash.""" | |
| result = compute_field_level_f1([]) | |
| assert result["micro_f1"] == 0.0 | |
| def test_none_ground_truth_not_counted(self): | |
| """Fields with None ground truth should be handled.""" | |
| annotations = [{ | |
| "patient_id": "p1", | |
| "noise_level": "clean", | |
| "document_type": "test", | |
| "fields": [ | |
| {"field_name": "biomarkers.ros1", "ground_truth": None, | |
| "extracted": None, "correct": False}, | |
| ] | |
| }] | |
| result = compute_field_level_f1(annotations) | |
| # Should not crash, though metrics may be 0 | |
| assert "micro_f1" in result | |
| ``` | |
| ### 7.6 ็ซฏๅฐ็ซฏ็ฎก็บฟๆต่ฏ | |
| ```python | |
| # tests/test_e2e_pipeline.py | |
| import pytest | |
| from pathlib import Path | |
| class TestE2EPipeline: | |
| """End-to-end tests for the complete data & evaluation pipeline.""" | |
| def test_fhir_to_profile_to_pdf_roundtrip(self, sample_fhir_file, tmp_path): | |
| """FHIR โ PatientProfile โ PDF should complete without error.""" | |
| from data.generate_synthetic_patients import parse_fhir_bundle | |
| from data.templates.clinical_letter import generate_clinical_letter | |
| from dataclasses import asdict | |
| # Step 1: Parse FHIR | |
| profile = parse_fhir_bundle(Path(sample_fhir_file)) | |
| assert profile.patient_id != "" | |
| # Step 2: Generate PDF | |
| pdf_path = tmp_path / "test_roundtrip.pdf" | |
| generate_clinical_letter(asdict(profile), str(pdf_path)) | |
| assert pdf_path.exists() | |
| assert pdf_path.stat().st_size > 1000 # Reasonable PDF size | |
| def test_noisy_pdf_pipeline(self, sample_profile, tmp_path): | |
| """Profile โ Noisy PDF should inject noise and produce valid PDF.""" | |
| from data.templates.clinical_letter import generate_clinical_letter | |
| from data.noise.noise_injector import NoiseInjector | |
| injector = NoiseInjector(noise_level="moderate", seed=42) | |
| # Inject text noise into profile fields for PDF rendering | |
| profile = sample_profile.copy() | |
| dx_text = profile["diagnosis"]["primary"] | |
| noisy_dx, records = injector.inject_text_noise(dx_text) | |
| profile["diagnosis"]["primary"] = noisy_dx | |
| pdf_path = tmp_path / "noisy.pdf" | |
| generate_clinical_letter(profile, str(pdf_path)) | |
| assert pdf_path.exists() | |
| def test_trec_evaluation_pipeline(self, tmp_path): | |
| """Complete TREC evaluation from dicts should produce metrics.""" | |
| import ir_measures | |
| from ir_measures import nDCG, Recall, P | |
| qrels = [ | |
| ir_measures.Qrel("1", "NCT001", 2), | |
| ir_measures.Qrel("1", "NCT002", 1), | |
| ir_measures.Qrel("1", "NCT003", 0), | |
| ] | |
| run = [ | |
| ir_measures.ScoredDoc("1", "NCT001", 0.9), | |
| ir_measures.ScoredDoc("1", "NCT002", 0.5), | |
| ir_measures.ScoredDoc("1", "NCT003", 0.1), | |
| ] | |
| result = ir_measures.calc_aggregate( | |
| [nDCG@10, Recall@50, P@10], qrels, run | |
| ) | |
| assert nDCG@10 in result | |
| assert Recall@50 in result | |
| assert result[nDCG@10] > 0 | |
| def test_latency_tracker_integration(self): | |
| """Latency tracker should record and summarize calls.""" | |
| import time | |
| from evaluation.latency_cost_tracker import LatencyCostTracker | |
| tracker = LatencyCostTracker() | |
| tracker.start_session("test-patient") | |
| with tracker.track_call("gemini", "search_anchors") as record: | |
| time.sleep(0.01) # Simulate API call | |
| record.input_tokens = 500 | |
| record.output_tokens = 200 | |
| session = tracker.end_session() | |
| assert session.total_latency_ms > 0 | |
| assert len(session.api_calls) == 1 | |
| summary = tracker.summary() | |
| assert summary["n_sessions"] == 1 | |
| assert summary["latency"]["mean_s"] > 0 | |
| ``` | |
| --- | |
| ## 8. ้ๅฝ | |
| ### 8.1 ๆฐๆฎๆ ผๅผ่ง่ | |
| #### PatientProfile v1 JSON Schema | |
| ```json | |
| { | |
| "$schema": "http://json-schema.org/draft-07/schema#", | |
| "type": "object", | |
| "required": ["patient_id", "demographics", "diagnosis"], | |
| "properties": { | |
| "patient_id": {"type": "string"}, | |
| "demographics": { | |
| "type": "object", | |
| "properties": { | |
| "name": {"type": "string"}, | |
| "sex": {"type": "string", "enum": ["male", "female"]}, | |
| "date_of_birth": {"type": "string", "format": "date"}, | |
| "age": {"type": "integer"}, | |
| "state": {"type": "string"} | |
| } | |
| }, | |
| "diagnosis": { | |
| "type": "object", | |
| "properties": { | |
| "primary": {"type": "string"}, | |
| "stage": {"type": ["string", "null"]}, | |
| "histology": {"type": ["string", "null"]}, | |
| "diagnosis_date": {"type": "string", "format": "date"} | |
| } | |
| }, | |
| "biomarkers": { | |
| "type": "object", | |
| "properties": { | |
| "egfr": {"type": ["string", "null"]}, | |
| "alk": {"type": ["string", "null"]}, | |
| "pdl1_tps": {"type": ["string", "null"]}, | |
| "kras": {"type": ["string", "null"]}, | |
| "ros1": {"type": ["string", "null"]} | |
| } | |
| }, | |
| "labs": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "name": {"type": "string"}, | |
| "value": {"type": "number"}, | |
| "unit": {"type": "string"}, | |
| "date": {"type": "string"}, | |
| "loinc_code": {"type": "string"} | |
| } | |
| } | |
| }, | |
| "treatments": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "name": {"type": "string"}, | |
| "type": {"type": "string", "enum": ["medication", "procedure", "radiation"]}, | |
| "start_date": {"type": "string"}, | |
| "end_date": {"type": ["string", "null"]} | |
| } | |
| } | |
| }, | |
| "unknowns": {"type": "array", "items": {"type": "string"}}, | |
| "evidence_spans": {"type": "array"} | |
| } | |
| } | |
| ``` | |
| ### 8.2 ๅทฅๅ ท API ๅ่ | |
| #### ir_datasets | |
| | API | ่ฏดๆ | ่ฟๅ็ฑปๅ | | |
| |-----|------|----------| | |
| | `ir_datasets.load("clinicaltrials/2021/trec-ct-2021")` | ๅ ่ฝฝ TREC CT 2021 ๆฐๆฎ้ | Dataset | | |
| | `dataset.queries_iter()` | ้ๅ topics | GenericQuery(query_id, text) | | |
| | `dataset.qrels_iter()` | ้ๅ qrels | TrecQrel(query_id, doc_id, relevance, iteration) | | |
| | `dataset.docs_iter()` | ้ๅๆๆกฃ | ClinicalTrialsDoc(doc_id, title, condition, summary, detailed_description, eligibility) | | |
| **ๆฐๆฎ้ ID๏ผ** | |
| - `clinicaltrials/2021/trec-ct-2021` โ 75 queries, 35,832 qrels | |
| - `clinicaltrials/2021/trec-ct-2022` โ 50 queries | |
| - `clinicaltrials/2021` โ 376K ๆๆกฃ๏ผๅบ็ก้๏ผ | |
| #### ir-measures | |
| | API | ่ฏดๆ | | |
| |-----|------| | |
| | `ir_measures.calc_aggregate(measures, qrels, run)` | ่ฎก็ฎ่ๅๆๆ | | |
| | `ir_measures.iter_calc(measures, qrels, run)` | ้ๆฅ่ฏขๆๆ ่ฟญไปฃ | | |
| | `ir_measures.read_trec_qrels(path)` | ่ฏปๅ TREC qrels ๆไปถ | | |
| | `ir_measures.read_trec_run(path)` | ่ฏปๅ TREC run ๆไปถ | | |
| | `ir_measures.Qrel(qid, did, rel)` | ๅๅปบ qrel ่ฎฐๅฝ | | |
| | `ir_measures.ScoredDoc(qid, did, score)` | ๅๅปบ่ฏๅๆๆกฃ่ฎฐๅฝ | | |
| **ๆๆ ๅฏน่ฑก๏ผ** | |
| - `nDCG@10` โ Normalized DCG at cutoff 10 | |
| - `Recall@50` โ Recall at cutoff 50 | |
| - `P@10` โ Precision at cutoff 10 | |
| - `AP` โ Average Precision | |
| - `AP(rel=2)` โ AP with minimum relevance 2 | |
| - `RR` โ Reciprocal Rank | |
| #### scikit-learn ่ฏไผฐ | |
| | API | ่ฏดๆ | | |
| |-----|------| | |
| | `f1_score(y_true, y_pred, average=None)` | ้็ฑปๅซ F1 | | |
| | `f1_score(y_true, y_pred, average='micro')` | ๅ จๅฑ micro F1 | | |
| | `f1_score(y_true, y_pred, average='macro')` | ้็ฑปๅซๅนณๅ F1 | | |
| | `precision_score(y_true, y_pred)` | Precision | | |
| | `recall_score(y_true, y_pred)` | Recall | | |
| | `classification_report(y_true, y_pred)` | ๅฎๆดๅ็ฑปๆฅๅ | | |
| | `confusion_matrix(y_true, y_pred)` | ๆททๆท็ฉ้ต | | |
| #### Synthea CLI | |
| | ๅๆฐ | ่ฏดๆ | ็คบไพ | | |
| |------|------|------| | |
| | `-p N` | ็ๆ N ไธชๆฃ่ | `-p 500` | | |
| | `-s SEED` | ้ๆบ็งๅญ | `-s 42` | | |
| | `-m MODULE` | ๆๅฎ็พ็ ๆจกๅ | `-m lung_cancer` | | |
| | `STATE` | ๆๅฎๅท | `Massachusetts` | | |
| | `--exporter.fhir.export` | ๅฏ็จ FHIR R4 ๅฏผๅบ | `=true` | | |
| | `--exporter.pretty_print` | ็พๅ JSON ่พๅบ | `=true` | | |
| #### ReportLab ๆ ธๅฟ API | |
| | ็ปไปถ | ่ฏดๆ | | |
| |------|------| | |
| | `SimpleDocTemplate(path, pagesize=letter)` | ๅๅปบๆๆกฃๆจกๆฟ | | |
| | `Paragraph(text, style)` | ๆฎต่ฝๆตๅผ็ปไปถ | | |
| | `Table(data, colWidths)` | ่กจๆ ผๆตๅผ็ปไปถ | | |
| | `TableStyle(commands)` | ่กจๆ ผๆ ทๅผ | | |
| | `Spacer(width, height)` | ้ด่ท็ปไปถ | | |
| | `getSampleStyleSheet()` | ่ทๅ้ป่ฎคๆ ทๅผ่กจ | | |
| #### Augraphy ้่ดจ็ฎก็บฟ | |
| | ็ปไปถ | ่ฏดๆ | | |
| |------|------| | |
| | `AugraphyPipeline(ink_phase, paper_phase, post_phase)` | ๅฎๆด้่ดจ็ฎก็บฟ | | |
| | `InkBleed(p=0.5)` | ๅขจๆฐดๆธ้ๆๆ | | |
| | `Letterpress(p=0.3)` | ๆดป็ๅฐๅทๆๆ | | |
| | `LowInkPeriodicLines(p=0.3)` | ไฝๅขจๆฐดๅจๆๆง็บฟๆก | | |
| | `DirtyDrum(p=0.3)` | ่้ผๆๆ | | |
| | `SubtleNoise(p=0.5)` | ๅพฎๅชๅฃฐ | | |
| | `Jpeg(p=0.5)` | JPEG ๅ็ผฉไผชๅฝฑ | | |
| | `Brightness(p=0.5)` | ไบฎๅบฆๅๅ | | |
| ### 8.3 Python ไพ่ตๆธ ๅ | |
| ``` | |
| # requirements-data-eval.txt | |
| ir-datasets>=0.5.6 | |
| ir-measures>=0.3.1 | |
| reportlab>=4.0 | |
| augraphy>=8.0 | |
| Pillow>=10.0 | |
| pdfplumber>=0.10 | |
| scikit-learn>=1.3 | |
| numpy>=1.24 | |
| pandas>=2.0 | |
| pdf2image>=1.16 | |
| ``` | |
| ### 8.4 ๆๅๆๆ ้ๆฅ่กจ | |
| | ๆๆ | ็ฎๆ ๅผ | ่ฏไผฐๅทฅๅ ท | ๆฐๆฎๆบ | | |
| |------|--------|----------|--------| | |
| | MedGemma Extraction F1 | >= 0.85 | scikit-learn `f1_score` | ๅๆๆฃ่ + Ground Truth | | |
| | Trial Retrieval Recall@50 | >= 0.75 | ir-measures `Recall@50` | TREC CT 2021/2022 | | |
| | Trial Ranking NDCG@10 | >= 0.60 | ir-measures `nDCG@10` | TREC CT 2021/2022 | | |
| | Criterion Decision Accuracy | >= 0.85 | Custom accuracy | ๆ ๆณจ EligibilityLedger | | |
| | Latency | < 15s | `LatencyCostTracker` | API call timing | | |
| | Cost | < $0.50/session | `LatencyCostTracker` | Token counting | | |