TrialPath / docs /tdd-guide-data-evaluation.md
yakilee's picture
chore: initialize project skeleton with pyproject.toml
1abff4e

TrialPath ๆ•ฐๆฎไธŽ่ฏ„ไผฐ็ฎก็บฟ TDD ๅฎž็ŽฐๆŒ‡ๅ—

ๅŸบไบŽ DeepWikiใ€TREC ๅฎ˜ๆ–นๆ–‡ๆกฃใ€ir-measures/ir_datasets ๅบ“ๆทฑๅบฆ็ ”็ฉถไบงๅ‡บ


1. ็ฎก็บฟๆžถๆž„ๆฆ‚่งˆ

1.1 ๆ•ฐๆฎๆตๅ›พ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Data & Evaluation Pipeline                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚   Synthea     โ”‚โ”€โ”€โ”€โ–ถโ”‚  FHIR Bundle โ”‚โ”€โ”€โ”€โ–ถโ”‚ PatientProfile   โ”‚   โ”‚
โ”‚  โ”‚  (Java CLI)   โ”‚    โ”‚   (JSON)     โ”‚    โ”‚  (JSON Schema)   โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                    โ”‚              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ–ผ              โ”‚
โ”‚  โ”‚  LLM Letter  โ”‚โ”€โ”€โ”€โ–ถโ”‚  ReportLab   โ”‚โ”€โ”€โ”€โ–ถ Noisy Clinical PDFs   โ”‚
โ”‚  โ”‚  Generator   โ”‚    โ”‚  + Augraphy  โ”‚    (Letters/Labs/Path)     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚  MedGemma    โ”‚โ”€โ”€โ”€โ–ถโ”‚  Extracted   โ”‚โ”€โ”€โ”€โ–ถโ”‚   F1 Evaluator   โ”‚   โ”‚
โ”‚  โ”‚  Extractor   โ”‚    โ”‚  Profile     โ”‚    โ”‚  (scikit-learn)  โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚  TREC Topics โ”‚โ”€โ”€โ”€โ–ถโ”‚  TrialPath   โ”‚โ”€โ”€โ”€โ–ถโ”‚  TREC Evaluator  โ”‚   โ”‚
โ”‚  โ”‚  (ir_datasets)โ”‚    โ”‚  Matching    โ”‚    โ”‚  (ir-measures)   โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

1.2 ๆจกๅ—ๅ…ณ็ณป

ๆจกๅ— ่พ“ๅ…ฅ ่พ“ๅ‡บ ไพ่ต–
data/generate_synthetic_patients.py Synthea FHIR Bundles PatientProfile JSON + Ground Truth Synthea CLI, FHIR R4
data/generate_noisy_pdfs.py PatientProfile JSON Clinical PDFs (ๅธฆๅ™ชๅฃฐ) ReportLab, Augraphy
evaluation/run_trec_benchmark.py TREC Topics + TrialPath Run Recall@50, NDCG@10, P@10 ir_datasets, ir-measures
evaluation/extraction_eval.py Extracted vs Ground Truth Profiles Field-level F1 scikit-learn
evaluation/criterion_eval.py EligibilityLedger vs Gold Standard Criterion Accuracy scikit-learn
evaluation/latency_cost_tracker.py API call logs Latency/Cost reports time, logging

1.3 ็›ฎๅฝ•็ป“ๆž„

data/
โ”œโ”€โ”€ generate_synthetic_patients.py   # Synthea FHIR โ†’ PatientProfile
โ”œโ”€โ”€ generate_noisy_pdfs.py           # PatientProfile โ†’ Clinical PDFs
โ”œโ”€โ”€ synthea_config/
โ”‚   โ”œโ”€โ”€ synthea.properties           # Synthea ้…็ฝฎ
โ”‚   โ””โ”€โ”€ modules/
โ”‚       โ””โ”€โ”€ lung_cancer_extended.json # ๆ‰ฉๅฑ• NSCLC ๆจกๅ— (ๅซ biomarkers)
โ”œโ”€โ”€ templates/
โ”‚   โ”œโ”€โ”€ clinical_letter.py           # ไธดๅบŠไฟกไปถๆจกๆฟ
โ”‚   โ”œโ”€โ”€ pathology_report.py          # ็—…็†ๆŠฅๅ‘Šๆจกๆฟ
โ”‚   โ”œโ”€โ”€ lab_report.py                # ๅฎž้ชŒๅฎคๆŠฅๅ‘Šๆจกๆฟ
โ”‚   โ””โ”€โ”€ imaging_report.py           # ๅฝฑๅƒๆŠฅๅ‘Šๆจกๆฟ
โ”œโ”€โ”€ noise/
โ”‚   โ””โ”€โ”€ noise_injector.py            # ๅ™ชๅฃฐๆณจๅ…ฅๅผ•ๆ“Ž
โ””โ”€โ”€ output/
    โ”œโ”€โ”€ fhir/                        # Synthea ๅŽŸๅง‹ FHIR ่พ“ๅ‡บ
    โ”œโ”€โ”€ profiles/                    # ่ฝฌๆขๅŽ็š„ PatientProfile JSON
    โ”œโ”€โ”€ pdfs/                        # ็”Ÿๆˆ็š„ไธดๅบŠ PDF
    โ””โ”€โ”€ ground_truth/                # ๆ ‡ๆณจๆ•ฐๆฎ

evaluation/
โ”œโ”€โ”€ run_trec_benchmark.py            # TREC ๆฃ€็ดข่ฏ„ไผฐ
โ”œโ”€โ”€ extraction_eval.py               # MedGemma ๆๅ– F1
โ”œโ”€โ”€ criterion_eval.py                # Criterion Decision Accuracy
โ”œโ”€โ”€ latency_cost_tracker.py          # ๅปถ่ฟŸไธŽๆˆๆœฌ่ฟฝ่ธช
โ”œโ”€โ”€ trec_data/
โ”‚   โ”œโ”€โ”€ topics2021.xml               # TREC 2021 topics
โ”‚   โ”œโ”€โ”€ qrels2021.txt                # TREC 2021 relevance judgments
โ”‚   โ””โ”€โ”€ topics2022.xml               # TREC 2022 topics
โ””โ”€โ”€ reports/                         # ่ฏ„ไผฐๆŠฅๅ‘Š่พ“ๅ‡บ

tests/
โ”œโ”€โ”€ test_synthea_data.py             # Synthea ๆ•ฐๆฎ้ชŒ่ฏ
โ”œโ”€โ”€ test_pdf_generation.py           # PDF ็”Ÿๆˆๆญฃ็กฎๆ€ง
โ”œโ”€โ”€ test_noise_injection.py          # ๅ™ชๅฃฐๆณจๅ…ฅๆ•ˆๆžœ
โ”œโ”€โ”€ test_trec_evaluation.py          # TREC ่ฏ„ไผฐ่ฎก็ฎ—
โ”œโ”€โ”€ test_extraction_f1.py            # F1 ่ฎก็ฎ—ๆต‹่ฏ•
โ”œโ”€โ”€ test_latency_cost.py             # ๅปถ่ฟŸๆˆๆœฌๆต‹่ฏ•
โ””โ”€โ”€ test_e2e_pipeline.py             # ็ซฏๅˆฐ็ซฏ็ฎก็บฟๆต‹่ฏ•

2. Synthea ๅˆๆˆๆ‚ฃ่€…็”ŸๆˆๆŒ‡ๅ—

2.1 Synthea ๆฆ‚่ฟฐ

Synthea ๆ˜ฏ MITRE ๅผ€ๅ‘็š„ๅผ€ๆบๅˆๆˆๆ‚ฃ่€…ๆจกๆ‹Ÿๅ™จ๏ผŒๅŸบไบŽ Java ๅฎž็Žฐใ€‚ๅฎƒ้€š่ฟ‡ JSON ็Šถๆ€ๆœบๆจกๅ—ๆจกๆ‹Ÿ็–พ็—…่ฝจ่ฟน๏ผŒ่พ“ๅ‡บๆ ‡ๅ‡† FHIR R4 Bundleใ€‚

ๅ…ณ้”ฎ็‰นๆ€ง๏ผˆๆฅๆบ๏ผšDeepWiki synthetichealth/synthea๏ผ‰๏ผš

  • ๅŸบไบŽๆจกๅ—็š„็–พ็—…ๆจกๆ‹Ÿ๏ผšๆฏ็ง็–พ็—…ๅฎšไน‰ไธบ JSON ็Šถๆ€ๆœบ
  • ๆ”ฏๆŒ FHIR R4/STU3/DSTU2 ๅฏผๅ‡บ
  • ๅ†…็ฝฎ lung_cancer.json ๆจกๅ—๏ผŒ85% NSCLC / 15% SCLC ๅˆ†ๅธƒ
  • ๆ”ฏๆŒ Stage I-IV ๅˆ†ๆœŸๅ’ŒๅŒ–็–—/ๆ”พ็–—ๆฒป็–—่ทฏๅพ„
  • ไธๅซ NSCLC ็‰นๅผ‚ๆ€ง biomarkers๏ผˆEGFR, ALK, PD-L1, KRAS, ROS1๏ผ‰โ€”โ€” ้œ€่ฆ่‡ชๅฎšไน‰ๆ‰ฉๅฑ•

2.2 ๅฎ‰่ฃ…ๅ’Œ้…็ฝฎ

็ณป็ปŸ่ฆๆฑ‚๏ผš

  • Java JDK 11 ๆˆ–ๆ›ด้ซ˜็‰ˆๆœฌ๏ผˆๆŽจ่ LTS 11 ๆˆ– 17๏ผ‰

ๅฎ‰่ฃ…ๆ–นๅผ A๏ผš็›ดๆŽฅไฝฟ็”จ JAR๏ผˆๆŽจ่็”จไบŽๆ•ฐๆฎ็”Ÿๆˆ๏ผ‰

# ไธ‹่ฝฝๆœ€ๆ–ฐ release JAR
# ไปŽ https://github.com/synthetichealth/synthea/releases ่Žทๅ–
wget https://github.com/synthetichealth/synthea/releases/download/master-branch-latest/synthea-with-dependencies.jar

# ้ชŒ่ฏๅฎ‰่ฃ…
java -jar synthea-with-dependencies.jar --help

ๅฎ‰่ฃ…ๆ–นๅผ B๏ผšไปŽๆบ็ ๆž„ๅปบ๏ผˆ้œ€่ฆ่‡ชๅฎšไน‰ๆจกๅ—ๆ—ถไฝฟ็”จ๏ผ‰

git clone https://github.com/synthetichealth/synthea.git
cd synthea
./gradlew build check test

2.3 NSCLC ๆจกๅ—้…็ฝฎ

2.3.1 ็Žฐๆœ‰ lung_cancer ๆจกๅ—ๅˆ†ๆž

ๆฅๆบ๏ผšDeepWiki ๅฏน synthetichealth/synthea ็š„ lung_cancer.json ๆจกๅ—ๅˆ†ๆž๏ผš

  • ๅ…ฅๅฃๆกไปถ๏ผš45-65 ๅฒไบบ็พค๏ผŒๅŸบไบŽๆฆ‚็އ่ฎก็ฎ—
  • ่ฏŠๆ–ญๆต็จ‹๏ผš็—‡็Šถ๏ผˆๅ’ณๅ—ฝใ€ๅ’ฏ่ก€ใ€ๆฐ”็Ÿญ๏ผ‰ โ†’ ่ƒธ้ƒจ X ๅ…‰ โ†’ ่ƒธ้ƒจ CT โ†’ ๆดปๆฃ€/็ป†่ƒžๅญฆ
  • ๅˆ†ๅž‹๏ผš85% NSCLC๏ผŒ15% SCLC
  • ๅˆ†ๆœŸ๏ผšStage I-IV๏ผŒๅŸบไบŽ lung_cancer_nondiagnosis_counter
  • ๆฒป็–—๏ผšNSCLC ไฝฟ็”จ Cisplatin + Paclitaxel โ†’ ๆ”พ็–—

2.3.2 ่‡ชๅฎšไน‰ NSCLC Biomarker ๆ‰ฉๅฑ•ๆจกๅ—

็”ฑไบŽๅŽŸ็”Ÿๆจกๅ—ไธๅซ EGFR/ALK/PD-L1 ็ญ‰ biomarkers๏ผŒ้œ€่ฆๅˆ›ๅปบๆ‰ฉๅฑ•ๅญๆจกๅ—ใ€‚

ๆ–‡ไปถ๏ผšdata/synthea_config/modules/lung_cancer_biomarkers.json

ๅŸบไบŽ DeepWiki ็ ”็ฉถ็š„ Synthea ๆจกๅ—็Šถๆ€็ฑปๅž‹๏ผŒๅฏ็”จ็š„็Šถๆ€็ฑปๅž‹ๅŒ…ๆ‹ฌ๏ผš

  • Initial โ€” ๆจกๅ—ๅ…ฅๅฃ
  • Terminal โ€” ๆจกๅ—ๅ‡บๅฃ
  • Observation โ€” ่ฎฐๅฝ•ไธดๅบŠ่ง‚ๅฏŸๅ€ผ๏ผˆ็”จไบŽ biomarkers๏ผ‰
  • SetAttribute โ€” ่ฎพ็ฝฎๆ‚ฃ่€…ๅฑžๆ€ง
  • Guard โ€” ๆกไปถ้—จๆŽง
  • Simple โ€” ็ฎ€ๅ•่ฝฌๆข็Šถๆ€
  • Encounter โ€” ๅฐฑ่ฏŠ็Šถๆ€

Biomarker ่ง‚ๅฏŸ็Šถๆ€็คบไพ‹็ป“ๆž„๏ผš

{
  "name": "NSCLC Biomarker Panel",
  "states": {
    "Initial": {
      "type": "Initial",
      "conditional_transition": [
        {
          "condition": {
            "condition_type": "Attribute",
            "attribute": "Lung Cancer Type",
            "operator": "==",
            "value": "NSCLC"
          },
          "transition": "EGFR_Test_Encounter"
        },
        {
          "transition": "Terminal"
        }
      ]
    },
    "EGFR_Test_Encounter": {
      "type": "Encounter",
      "encounter_class": "ambulatory",
      "codes": [
        {
          "system": "SNOMED-CT",
          "code": "185349003",
          "display": "Encounter for check up"
        }
      ],
      "direct_transition": "EGFR_Mutation_Status"
    },
    "EGFR_Mutation_Status": {
      "type": "Observation",
      "category": "laboratory",
      "codes": [
        {
          "system": "LOINC",
          "code": "41103-3",
          "display": "EGFR gene mutations found"
        }
      ],
      "distributed_transition": [
        {
          "distribution": 0.15,
          "transition": "EGFR_Positive"
        },
        {
          "distribution": 0.85,
          "transition": "EGFR_Negative"
        }
      ]
    },
    "EGFR_Positive": {
      "type": "SetAttribute",
      "attribute": "egfr_status",
      "value": "positive",
      "direct_transition": "ALK_Rearrangement_Status"
    },
    "EGFR_Negative": {
      "type": "SetAttribute",
      "attribute": "egfr_status",
      "value": "negative",
      "direct_transition": "ALK_Rearrangement_Status"
    },
    "ALK_Rearrangement_Status": {
      "type": "Observation",
      "category": "laboratory",
      "codes": [
        {
          "system": "LOINC",
          "code": "46264-8",
          "display": "ALK gene rearrangement"
        }
      ],
      "distributed_transition": [
        {
          "distribution": 0.05,
          "transition": "ALK_Positive"
        },
        {
          "distribution": 0.95,
          "transition": "ALK_Negative"
        }
      ]
    },
    "ALK_Positive": {
      "type": "SetAttribute",
      "attribute": "alk_status",
      "value": "positive",
      "direct_transition": "PDL1_Expression"
    },
    "ALK_Negative": {
      "type": "SetAttribute",
      "attribute": "alk_status",
      "value": "negative",
      "direct_transition": "PDL1_Expression"
    },
    "PDL1_Expression": {
      "type": "Observation",
      "category": "laboratory",
      "codes": [
        {
          "system": "LOINC",
          "code": "85147-0",
          "display": "PD-L1 by immune stain"
        }
      ],
      "distributed_transition": [
        {
          "distribution": 0.30,
          "transition": "PDL1_High"
        },
        {
          "distribution": 0.35,
          "transition": "PDL1_Low"
        },
        {
          "distribution": 0.35,
          "transition": "PDL1_Negative"
        }
      ]
    },
    "PDL1_High": {
      "type": "SetAttribute",
      "attribute": "pdl1_tps",
      "value": ">=50%",
      "direct_transition": "KRAS_Mutation_Status"
    },
    "PDL1_Low": {
      "type": "SetAttribute",
      "attribute": "pdl1_tps",
      "value": "1-49%",
      "direct_transition": "KRAS_Mutation_Status"
    },
    "PDL1_Negative": {
      "type": "SetAttribute",
      "attribute": "pdl1_tps",
      "value": "<1%",
      "direct_transition": "KRAS_Mutation_Status"
    },
    "KRAS_Mutation_Status": {
      "type": "Observation",
      "category": "laboratory",
      "codes": [
        {
          "system": "LOINC",
          "code": "21717-3",
          "display": "KRAS gene mutations found"
        }
      ],
      "distributed_transition": [
        {
          "distribution": 0.25,
          "transition": "KRAS_Positive"
        },
        {
          "distribution": 0.75,
          "transition": "KRAS_Negative"
        }
      ]
    },
    "KRAS_Positive": {
      "type": "SetAttribute",
      "attribute": "kras_status",
      "value": "positive",
      "direct_transition": "Terminal"
    },
    "KRAS_Negative": {
      "type": "SetAttribute",
      "attribute": "kras_status",
      "value": "negative",
      "direct_transition": "Terminal"
    },
    "Terminal": {
      "type": "Terminal"
    }
  }
}

Biomarker ๆต่กŒ็އๅˆ†ๅธƒ๏ผˆๅŸบไบŽ NSCLC ๆ–‡็Œฎ๏ผ‰๏ผš

Biomarker ้˜ณๆ€ง็އ LOINC Code ่ฏดๆ˜Ž
EGFR mutation ~15% 41103-3 ้žๅธ็ƒŸไบš่ฃ”ๅฅณๆ€งๆ›ด้ซ˜
ALK rearrangement ~5% 46264-8 ๅนด่ฝป้žๅธ็ƒŸ่€…ๆ›ดๅธธ่ง
PD-L1 TPS>=50% ~30% 85147-0 ๅ…็–ซๆฒป็–—้€‚็”จๆ ‡ๅ‡†
KRAS G12C ~13% 21717-3 Sotorasib ้ถๅ‘
ROS1 fusion ~1-2% 46265-5 Crizotinib ้ถๅ‘

2.4 ๆ‰น้‡็”Ÿๆˆๅ‘ฝไปค

# ็”Ÿๆˆ 500 ไธช NSCLC ๆ‚ฃ่€…๏ผŒไฝฟ็”จ็งๅญ็กฎไฟๅฏ้‡็Žฐ
java -jar synthea-with-dependencies.jar \
  -p 500 \
  -s 42 \
  -m lung_cancer \
  --exporter.fhir.export=true \
  --exporter.fhir_stu3.export=false \
  --exporter.fhir_dstu2.export=false \
  --exporter.ccda.export=false \
  --exporter.csv.export=false \
  --exporter.hospital.fhir.export=false \
  --exporter.practitioner.fhir.export=false \
  --exporter.pretty_print=true \
  Massachusetts

# ๅ‚ๆ•ฐ่ฏดๆ˜Ž:
# -p 500       : ็”Ÿๆˆ 500 ไธชๆ‚ฃ่€…
# -s 42        : ้šๆœบ็งๅญ (ๅฏ้‡็Žฐ)
# -m lung_cancer : ไป…่ฟ่กŒ lung_cancer ๆจกๅ—
# --exporter.fhir.export=true : ๅฏ็”จ FHIR R4 ๅฏผๅ‡บ
# Massachusetts : ็”ŸๆˆๅœฐๅŒบ

่พ“ๅ‡บไฝ็ฝฎ๏ผš ./output/fhir/ ็›ฎๅฝ•ไธ‹๏ผŒๆฏไธชๆ‚ฃ่€…ไธ€ไธช JSON ๆ–‡ไปถใ€‚

2.5 FHIR Bundle ่พ“ๅ‡บๆ ผๅผ

ๆฅๆบ๏ผšDeepWiki synthetichealth/synthea ๅ…ณไบŽ FHIR ๅฏผๅ‡บ็ณป็ปŸ็š„ๅˆ†ๆžใ€‚

้กถๅฑ‚็ป“ๆž„๏ผš

{
  "resourceType": "Bundle",
  "type": "transaction",
  "entry": [
    {
      "fullUrl": "urn:uuid:patient-uuid-here",
      "resource": { "resourceType": "Patient", ... },
      "request": { "method": "POST", "url": "Patient" }
    },
    {
      "fullUrl": "urn:uuid:condition-uuid-here",
      "resource": { "resourceType": "Condition", ... },
      "request": { "method": "POST", "url": "Condition" }
    }
  ]
}

Synthea ็”Ÿๆˆ็š„ FHIR Resource ็ฑปๅž‹๏ผˆDeepWiki ็กฎ่ฎค๏ผ‰๏ผš

  • Patient โ€” ๆ‚ฃ่€…ๅŸบๆœฌไฟกๆฏ
  • Condition โ€” ่ฏŠๆ–ญ๏ผˆๅฆ‚ NSCLC๏ผ‰
  • Observation โ€” ๅฎž้ชŒๅฎคๆฃ€ๆŸฅๅ’Œ็”Ÿๅ‘ฝไฝ“ๅพ
  • MedicationRequest โ€” ็”จ่ฏๅค„ๆ–น
  • Procedure โ€” ๆ‰‹ๆœฏๅ’Œๆ“ไฝœ
  • DiagnosticReport โ€” ่ฏŠๆ–ญๆŠฅๅ‘Š
  • DocumentReference โ€” ไธดๅบŠๆ–‡ๆกฃ๏ผˆ้œ€ US Core IG ๅฏ็”จ๏ผ‰
  • Encounter โ€” ๅฐฑ่ฏŠ่ฎฐๅฝ•
  • AllergyIntolerance โ€” ่ฟ‡ๆ•ๅฒ
  • Immunization โ€” ๅ…็–ซๆŽฅ็ง
  • CarePlan โ€” ๆŠค็†่ฎกๅˆ’
  • ImagingStudy โ€” ๅฝฑๅƒๆฃ€ๆŸฅ

2.6 FHIR Resource ๅˆฐ PatientProfile ็š„ๆ˜ ๅฐ„

# data/generate_synthetic_patients.py ไธญ็š„ๆ˜ ๅฐ„้€ป่พ‘

FHIR_TO_PATIENT_PROFILE_MAP = {
    # Patient Resource โ†’ demographics
    "Patient.name": "demographics.name",
    "Patient.gender": "demographics.sex",
    "Patient.birthDate": "demographics.date_of_birth",
    "Patient.address.state": "demographics.state",

    # Condition Resource โ†’ diagnosis
    "Condition[code=SNOMED:254637007]": "diagnosis.primary",  # NSCLC
    "Condition.stage.summary": "diagnosis.stage",
    "Condition.bodySite": "diagnosis.histology",

    # Observation Resources โ†’ biomarkers
    "Observation[code=LOINC:41103-3]": "biomarkers.egfr",
    "Observation[code=LOINC:46264-8]": "biomarkers.alk",
    "Observation[code=LOINC:85147-0]": "biomarkers.pdl1_tps",
    "Observation[code=LOINC:21717-3]": "biomarkers.kras",

    # Observation Resources โ†’ labs
    "Observation[category=laboratory]": "labs[]",

    # MedicationRequest โ†’ prior_treatments
    "MedicationRequest.medicationCodeableConcept": "treatments[].medication",

    # Procedure โ†’ prior_treatments
    "Procedure.code": "treatments[].procedure",
}

่ฝฌๆขๅ‡ฝๆ•ฐๆจกๅผ๏ผš

import json
from pathlib import Path
from dataclasses import dataclass, field, asdict
from typing import Optional

@dataclass
class Demographics:
    name: str = ""
    sex: str = ""
    date_of_birth: str = ""
    age: int = 0
    state: str = ""

@dataclass
class Diagnosis:
    primary: str = ""
    stage: str = ""
    histology: str = ""
    diagnosis_date: str = ""

@dataclass
class Biomarkers:
    egfr: Optional[str] = None
    alk: Optional[str] = None
    pdl1_tps: Optional[str] = None
    kras: Optional[str] = None
    ros1: Optional[str] = None

@dataclass
class LabResult:
    name: str = ""
    value: float = 0.0
    unit: str = ""
    date: str = ""
    loinc_code: str = ""

@dataclass
class Treatment:
    name: str = ""
    type: str = ""  # "medication" | "procedure" | "radiation"
    start_date: str = ""
    end_date: Optional[str] = None

@dataclass
class PatientProfile:
    patient_id: str = ""
    demographics: Demographics = field(default_factory=Demographics)
    diagnosis: Diagnosis = field(default_factory=Diagnosis)
    biomarkers: Biomarkers = field(default_factory=Biomarkers)
    labs: list[LabResult] = field(default_factory=list)
    treatments: list[Treatment] = field(default_factory=list)
    unknowns: list[str] = field(default_factory=list)
    evidence_spans: list[dict] = field(default_factory=list)


def parse_fhir_bundle(fhir_path: Path) -> PatientProfile:
    """Parse a Synthea FHIR Bundle JSON into PatientProfile."""
    with open(fhir_path) as f:
        bundle = json.load(f)

    profile = PatientProfile()
    entries = bundle.get("entry", [])

    for entry in entries:
        resource = entry.get("resource", {})
        resource_type = resource.get("resourceType")

        if resource_type == "Patient":
            _parse_patient(resource, profile)
        elif resource_type == "Condition":
            _parse_condition(resource, profile)
        elif resource_type == "Observation":
            _parse_observation(resource, profile)
        elif resource_type == "MedicationRequest":
            _parse_medication(resource, profile)
        elif resource_type == "Procedure":
            _parse_procedure(resource, profile)

    return profile


def _parse_patient(resource: dict, profile: PatientProfile):
    """Extract demographics from Patient resource."""
    names = resource.get("name", [{}])
    if names:
        given = " ".join(names[0].get("given", []))
        family = names[0].get("family", "")
        profile.demographics.name = f"{given} {family}".strip()

    profile.demographics.sex = resource.get("gender", "")
    profile.demographics.date_of_birth = resource.get("birthDate", "")
    profile.patient_id = resource.get("id", "")

    addresses = resource.get("address", [{}])
    if addresses:
        profile.demographics.state = addresses[0].get("state", "")


def _parse_condition(resource: dict, profile: PatientProfile):
    """Extract diagnosis from Condition resource."""
    code = resource.get("code", {})
    codings = code.get("coding", [])
    for coding in codings:
        # SNOMED codes for lung cancer
        if coding.get("code") in ["254637007", "254632001"]:
            profile.diagnosis.primary = coding.get("display", "")
            onset = resource.get("onsetDateTime", "")
            profile.diagnosis.diagnosis_date = onset
            # Extract stage if available
            stage_info = resource.get("stage", [])
            if stage_info:
                summary = stage_info[0].get("summary", {})
                stage_codings = summary.get("coding", [])
                if stage_codings:
                    profile.diagnosis.stage = stage_codings[0].get("display", "")


def _parse_observation(resource: dict, profile: PatientProfile):
    """Extract labs and biomarkers from Observation resource."""
    code = resource.get("code", {})
    codings = code.get("coding", [])
    category_list = resource.get("category", [])
    is_lab = any(
        cat_coding.get("code") == "laboratory"
        for cat in category_list
        for cat_coding in cat.get("coding", [])
    )

    for coding in codings:
        loinc = coding.get("code", "")
        display = coding.get("display", "")

        # Biomarker mappings
        biomarker_map = {
            "41103-3": "egfr",
            "46264-8": "alk",
            "85147-0": "pdl1_tps",
            "21717-3": "kras",
            "46265-5": "ros1",
        }

        if loinc in biomarker_map:
            value_cc = resource.get("valueCodeableConcept", {})
            value_codings = value_cc.get("coding", [])
            value_str = value_codings[0].get("display", "") if value_codings else ""
            setattr(profile.biomarkers, biomarker_map[loinc], value_str)
        elif is_lab:
            value_qty = resource.get("valueQuantity", {})
            lab = LabResult(
                name=display,
                value=value_qty.get("value", 0.0),
                unit=value_qty.get("unit", ""),
                date=resource.get("effectiveDateTime", ""),
                loinc_code=loinc,
            )
            profile.labs.append(lab)

3. ๅˆๆˆ PDF ็”Ÿๆˆ็ฎก็บฟ

3.1 ๆฆ‚่ฟฐ

็›ฎๆ ‡๏ผšๅฐ† PatientProfile ่ฝฌๆขไธบ้€ผ็œŸ็š„ไธดๅบŠๆ–‡ๆกฃ PDF๏ผŒๅนถๆณจๅ…ฅๅ—ๆŽงๅ™ชๅฃฐไปฅๆจกๆ‹Ÿ็œŸๅฎžไธ–็•Œ OCR ๅœบๆ™ฏใ€‚

ๆŠ€ๆœฏๆ ˆ๏ผš

  • ReportLab (pip install reportlab) โ€” PDF ็”Ÿๆˆๅผ•ๆ“Ž๏ผŒๆ”ฏๆŒ SimpleDocTemplateใ€Tableใ€Paragraph ็ญ‰ Platypus ๆตๅผ็ป„ไปถ
  • Augraphy (pip install augraphy) โ€” ๆ–‡ๆกฃๅ›พๅƒ้€€ๅŒ–็ฎก็บฟ๏ผŒๆจกๆ‹Ÿๆ‰“ๅฐใ€ไผ ็œŸใ€ๆ‰ซๆๅ™ชๅฃฐ
  • Pillow (pip install Pillow) โ€” ๅ›พๅƒๅค„็†
  • pdf2image (pip install pdf2image) โ€” PDF ่ฝฌๅ›พๅƒ๏ผˆ็”จไบŽๅ™ชๅฃฐๆณจๅ…ฅๅŽ่ฝฌๅ›ž PDF๏ผ‰

3.2 ไธดๅบŠไฟกไปถๆจกๆฟ

# data/templates/clinical_letter.py
from reportlab.lib.pagesizes import letter
from reportlab.lib.units import inch
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.platypus import (
    SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
)
from reportlab.lib import colors


def generate_clinical_letter(profile: dict, output_path: str):
    """Generate a clinical letter PDF from PatientProfile."""
    doc = SimpleDocTemplate(output_path, pagesize=letter,
                            topMargin=1*inch, bottomMargin=1*inch)
    styles = getSampleStyleSheet()
    story = []

    # Header
    header_style = ParagraphStyle(
        'Header', parent=styles['Heading1'], fontSize=14,
        spaceAfter=6
    )
    story.append(Paragraph("Clinical Summary Letter", header_style))
    story.append(Spacer(1, 12))

    # Patient Info
    info_data = [
        ["Patient Name:", profile["demographics"]["name"]],
        ["Date of Birth:", profile["demographics"]["date_of_birth"]],
        ["Sex:", profile["demographics"]["sex"]],
        ["MRN:", profile["patient_id"]],
    ]
    info_table = Table(info_data, colWidths=[2*inch, 4*inch])
    info_table.setStyle(TableStyle([
        ('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
        ('FONTNAME', (1, 0), (1, -1), 'Helvetica'),
        ('FONTSIZE', (0, 0), (-1, -1), 10),
        ('VALIGN', (0, 0), (-1, -1), 'TOP'),
    ]))
    story.append(info_table)
    story.append(Spacer(1, 18))

    # Diagnosis Section
    story.append(Paragraph("Diagnosis", styles['Heading2']))
    dx = profile.get("diagnosis", {})
    dx_text = (
        f"Primary: {dx.get('primary', 'Unknown')}. "
        f"Stage: {dx.get('stage', 'Unknown')}. "
        f"Histology: {dx.get('histology', 'Unknown')}. "
        f"Diagnosed: {dx.get('diagnosis_date', 'Unknown')}."
    )
    story.append(Paragraph(dx_text, styles['Normal']))
    story.append(Spacer(1, 12))

    # Biomarkers Section
    story.append(Paragraph("Molecular Testing", styles['Heading2']))
    bm = profile.get("biomarkers", {})
    bm_data = [["Biomarker", "Result"]]
    for marker, value in bm.items():
        if value is not None:
            bm_data.append([marker.upper(), str(value)])
    if len(bm_data) > 1:
        bm_table = Table(bm_data, colWidths=[2.5*inch, 3.5*inch])
        bm_table.setStyle(TableStyle([
            ('BACKGROUND', (0, 0), (-1, 0), colors.lightgrey),
            ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
            ('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
            ('FONTSIZE', (0, 0), (-1, -1), 10),
        ]))
        story.append(bm_table)
    story.append(Spacer(1, 12))

    # Treatment History
    story.append(Paragraph("Treatment History", styles['Heading2']))
    treatments = profile.get("treatments", [])
    for tx in treatments:
        tx_text = f"- {tx['name']} ({tx['type']}): {tx.get('start_date', '')}"
        story.append(Paragraph(tx_text, styles['Normal']))

    doc.build(story)

3.3 ็—…็†ๆŠฅๅ‘Šๆจกๆฟ

# data/templates/pathology_report.py
def generate_pathology_report(profile: dict, output_path: str):
    """Generate a pathology report PDF."""
    doc = SimpleDocTemplate(output_path, pagesize=letter)
    styles = getSampleStyleSheet()
    story = []

    story.append(Paragraph("SURGICAL PATHOLOGY REPORT", styles['Title']))
    story.append(Spacer(1, 12))

    # Specimen Info
    spec_data = [
        ["Specimen:", "Right lung, upper lobe, wedge resection"],
        ["Procedure:", "CT-guided needle biopsy"],
        ["Date:", profile["diagnosis"]["diagnosis_date"]],
    ]
    spec_table = Table(spec_data, colWidths=[2*inch, 4*inch])
    story.append(spec_table)
    story.append(Spacer(1, 12))

    # Final Diagnosis
    story.append(Paragraph("FINAL DIAGNOSIS", styles['Heading2']))
    story.append(Paragraph(
        f"Non-small cell lung carcinoma, {profile['diagnosis'].get('histology', 'adenocarcinoma')}, "
        f"{profile['diagnosis'].get('stage', 'Stage IIIA')}",
        styles['Normal']
    ))

    # Biomarker Results
    story.append(Spacer(1, 12))
    story.append(Paragraph("MOLECULAR/IMMUNOHISTOCHEMISTRY", styles['Heading2']))
    bm = profile.get("biomarkers", {})
    results = []
    if bm.get("egfr"):
        results.append(f"EGFR mutation analysis: {bm['egfr']}")
    if bm.get("alk"):
        results.append(f"ALK rearrangement (FISH): {bm['alk']}")
    if bm.get("pdl1_tps"):
        results.append(f"PD-L1 (22C3, TPS): {bm['pdl1_tps']}")
    if bm.get("kras"):
        results.append(f"KRAS mutation analysis: {bm['kras']}")
    for r in results:
        story.append(Paragraph(r, styles['Normal']))

    doc.build(story)

3.4 ๅฎž้ชŒๅฎคๆŠฅๅ‘Šๆจกๆฟ

# data/templates/lab_report.py
def generate_lab_report(profile: dict, output_path: str):
    """Generate a laboratory report PDF with CBC, CMP, etc."""
    doc = SimpleDocTemplate(output_path, pagesize=letter)
    styles = getSampleStyleSheet()
    story = []

    story.append(Paragraph("LABORATORY REPORT", styles['Title']))
    story.append(Spacer(1, 12))

    # Lab Results Table
    lab_data = [["Test", "Result", "Unit", "Reference Range", "Date"]]
    for lab in profile.get("labs", []):
        lab_data.append([
            lab["name"], str(lab["value"]), lab["unit"],
            "",  # Reference range (can be added)
            lab["date"][:10] if lab["date"] else ""
        ])

    if len(lab_data) > 1:
        lab_table = Table(lab_data, colWidths=[2*inch, 1*inch, 0.8*inch, 1.2*inch, 1*inch])
        lab_table.setStyle(TableStyle([
            ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#003366')),
            ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
            ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
            ('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
            ('FONTSIZE', (0, 0), (-1, -1), 9),
            ('ROWBACKGROUNDS', (0, 1), (-1, -1), [colors.white, colors.HexColor('#f0f0f0')]),
        ]))
        story.append(lab_table)

    doc.build(story)

3.5 ๅ™ชๅฃฐๆณจๅ…ฅ็ญ–็•ฅ

# data/noise/noise_injector.py
import random
import re
from pathlib import Path
from PIL import Image

# Augraphy ็ฎก็บฟ้…็ฝฎ
try:
    from augraphy import (
        AugraphyPipeline, InkBleed, Letterpress, LowInkPeriodicLines,
        DirtyDrum, SubtleNoise, Jpeg, Brightness, BleedThrough
    )
    AUGRAPHY_AVAILABLE = True
except ImportError:
    AUGRAPHY_AVAILABLE = False


class NoiseInjector:
    """ๅ—ๆŽงๅ™ชๅฃฐๆณจๅ…ฅๅผ•ๆ“Ž๏ผŒๆจกๆ‹Ÿ็œŸๅฎžไธ–็•Œๆ–‡ๆกฃ้€€ๅŒ–ใ€‚"""

    # OCR ๅธธ่ง้”™่ฏฏๆ˜ ๅฐ„
    OCR_ERROR_MAP = {
        "0": ["O", "o", "Q"],
        "1": ["l", "I", "|"],
        "5": ["S", "s"],
        "8": ["B"],
        "O": ["0", "Q"],
        "l": ["1", "I", "|"],
        "rn": ["m"],
        "cl": ["d"],
        "vv": ["w"],
    }

    # ๅŒปๅญฆ็ผฉๅ†™ๆ›ฟๆข
    ABBREVIATION_MAP = {
        "non-small cell lung cancer": ["NSCLC", "non-small cell ca", "NSCC"],
        "adenocarcinoma": ["adeno", "adenoca", "adeno ca"],
        "squamous cell carcinoma": ["SCC", "squamous ca", "sq cell ca"],
        "Eastern Cooperative Oncology Group": ["ECOG"],
        "performance status": ["PS", "perf status"],
        "milligrams per deciliter": ["mg/dL", "mg/dl"],
        "computed tomography": ["CT", "cat scan"],
    }

    # ๅ™ชๅฃฐ็บงๅˆซ้…็ฝฎ
    NOISE_LEVELS = {
        "clean": {"ocr_rate": 0.0, "abbrev_rate": 0.0, "missing_rate": 0.0},
        "mild": {"ocr_rate": 0.02, "abbrev_rate": 0.1, "missing_rate": 0.05},
        "moderate": {"ocr_rate": 0.05, "abbrev_rate": 0.2, "missing_rate": 0.1},
        "severe": {"ocr_rate": 0.10, "abbrev_rate": 0.3, "missing_rate": 0.2},
    }

    def __init__(self, noise_level: str = "mild", seed: int = 42):
        self.config = self.NOISE_LEVELS[noise_level]
        self.rng = random.Random(seed)

    def inject_text_noise(self, text: str) -> tuple[str, list[dict]]:
        """Inject OCR errors and abbreviations into text.

        Returns (noisy_text, list_of_injected_noise_records).
        """
        noise_records = []
        chars = list(text)

        # OCR character substitutions
        i = 0
        while i < len(chars):
            if self.rng.random() < self.config["ocr_rate"]:
                original = chars[i]
                if original in self.OCR_ERROR_MAP:
                    replacement = self.rng.choice(self.OCR_ERROR_MAP[original])
                    chars[i] = replacement
                    noise_records.append({
                        "type": "ocr_error",
                        "position": i,
                        "original": original,
                        "replacement": replacement,
                    })
            i += 1

        noisy_text = "".join(chars)

        # Abbreviation substitutions
        for full_form, abbreviations in self.ABBREVIATION_MAP.items():
            if full_form in noisy_text.lower() and self.rng.random() < self.config["abbrev_rate"]:
                abbrev = self.rng.choice(abbreviations)
                noisy_text = re.sub(
                    re.escape(full_form), abbrev, noisy_text, count=1, flags=re.IGNORECASE
                )
                noise_records.append({
                    "type": "abbreviation",
                    "original": full_form,
                    "replacement": abbrev,
                })

        return noisy_text, noise_records

    def inject_missing_values(self, profile: dict) -> tuple[dict, list[str]]:
        """Randomly remove fields from profile to simulate missing data.

        Returns (modified_profile, list_of_removed_fields).
        """
        removed = []
        removable_fields = [
            ("biomarkers", "egfr"),
            ("biomarkers", "alk"),
            ("biomarkers", "pdl1_tps"),
            ("biomarkers", "kras"),
            ("biomarkers", "ros1"),
            ("diagnosis", "stage"),
            ("diagnosis", "histology"),
        ]

        for section, field_name in removable_fields:
            if self.rng.random() < self.config["missing_rate"]:
                if section in profile and field_name in profile[section]:
                    profile[section][field_name] = None
                    removed.append(f"{section}.{field_name}")

        return profile, removed

    def degrade_image(self, image: Image.Image) -> Image.Image:
        """Apply Augraphy degradation pipeline to document image."""
        if not AUGRAPHY_AVAILABLE:
            return image

        import numpy as np
        img_array = np.array(image)

        pipeline = AugraphyPipeline(
            ink_phase=[
                InkBleed(p=0.5),
                Letterpress(p=0.3),
                LowInkPeriodicLines(p=0.3),
            ],
            paper_phase=[
                SubtleNoise(p=0.5),
            ],
            post_phase=[
                DirtyDrum(p=0.3),
                Brightness(p=0.5),
                Jpeg(p=0.5),
            ],
        )

        degraded = pipeline(img_array)
        return Image.fromarray(degraded)

4. TREC ๅŸบๅ‡†่ฏ„ไผฐๆŒ‡ๅ—

4.1 ๆ•ฐๆฎ้›†ๆฆ‚่ฟฐ

TREC Clinical Trials Track 2021๏ผš

  • ๆฅๆบ๏ผšNIST ๆ–‡ๆœฌๆฃ€็ดขไผš่ฎฎ
  • Topics๏ผˆๆŸฅ่ฏข๏ผ‰๏ผš75 ไธชๅˆๆˆๆ‚ฃ่€…ๆ่ฟฐ๏ผˆ5-10 ๅฅๅ…ฅ้™ข่ฎฐๅฝ•๏ผ‰
  • ๆ–‡ๆกฃ้›†๏ผš376,000+ ไธดๅบŠ่ฏ•้ชŒ๏ผˆClinicalTrials.gov 2021 ๅนด 4 ๆœˆๅฟซ็…ง๏ผ‰
  • Qrels๏ผš35,832 ๆก็›ธๅ…ณๆ€งๅˆคๆ–ญ
  • ็›ธๅ…ณๆ€งๆ ‡็ญพ๏ผš0=ไธ็›ธๅ…ณ๏ผŒ1=ๆŽ’้™ค๏ผŒ2=ๅˆๆ ผ

TREC Clinical Trials Track 2022๏ผš

  • Topics๏ผš50 ไธชๅˆๆˆๆ‚ฃ่€…ๆ่ฟฐ
  • ไฝฟ็”จ็›ธๅŒ็š„ๆ–‡ๆกฃ้›†ๅฟซ็…ง

4.2 ๆ•ฐๆฎๆ ผๅผ

Topics XML ๆ ผๅผ

<topics task="2021 TREC Clinical Trials">
  <topic number="1">
    A 62-year-old male presents with a 3-month history of
    progressive dyspnea and a 20-pound weight loss. He has
    a 40 pack-year smoking history. CT chest reveals a 4.5cm
    right upper lobe mass with mediastinal lymphadenopathy.
    Biopsy confirms non-small cell lung cancer, adenocarcinoma.
    EGFR mutation testing is positive for exon 19 deletion.
    PD-L1 TPS is 60%. ECOG performance status is 1.
  </topic>
  <topic number="2">
    ...
  </topic>
</topics>

Qrels ๆ ผๅผ๏ผˆๅˆถ่กจ็ฌฆๅˆ†้š”๏ผ‰

topic_id  0  doc_id  relevance
1         0  NCT00760162  2
1         0  NCT01234567  1
1         0  NCT09876543  0
  • ๅˆ— 1๏ผšTopic ็ผ–ๅท
  • ๅˆ— 2๏ผšๅ›บๅฎšๅ€ผ 0๏ผˆ่ฟญไปฃๆฌกๆ•ฐ๏ผ‰
  • ๅˆ— 3๏ผšNCT ๆ–‡ๆกฃ ID
  • ๅˆ— 4๏ผš็›ธๅ…ณๆ€ง๏ผˆ0=ไธ็›ธๅ…ณ๏ผŒ1=ๆŽ’้™ค๏ผŒ2=ๅˆๆ ผ๏ผ‰

Run ๆไบคๆ ผๅผ

TOPIC_NO Q0 NCT_ID RANK SCORE RUN_NAME
1 Q0 NCT00760162 1 0.9999 trialpath-v1
1 Q0 NCT01234567 2 0.9998 trialpath-v1

4.3 ไฝฟ็”จ ir_datasets ๅŠ ่ฝฝๆ•ฐๆฎ

# evaluation/run_trec_benchmark.py
import ir_datasets

def load_trec_2021():
    """Load TREC CT 2021 topics and qrels via ir_datasets."""
    dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2021")

    # ๅŠ ่ฝฝ topics (GenericQuery: query_id, text)
    topics = {}
    for query in dataset.queries_iter():
        topics[query.query_id] = query.text

    # ๅŠ ่ฝฝ qrels (TrecQrel: query_id, doc_id, relevance, iteration)
    qrels = {}
    for qrel in dataset.qrels_iter():
        if qrel.query_id not in qrels:
            qrels[qrel.query_id] = {}
        qrels[qrel.query_id][qrel.doc_id] = qrel.relevance

    return topics, qrels


def load_trec_2022():
    """Load TREC CT 2022 topics and qrels."""
    dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2022")

    topics = {q.query_id: q.text for q in dataset.queries_iter()}
    qrels = {}
    for qrel in dataset.qrels_iter():
        if qrel.query_id not in qrels:
            qrels[qrel.query_id] = {}
        qrels[qrel.query_id][qrel.doc_id] = qrel.relevance

    return topics, qrels


def load_trial_documents():
    """Load the clinical trial documents from ir_datasets."""
    dataset = ir_datasets.load("clinicaltrials/2021")
    # ClinicalTrialsDoc: doc_id, title, condition, summary,
    #                     detailed_description, eligibility
    docs = {}
    for doc in dataset.docs_iter():
        docs[doc.doc_id] = {
            "title": doc.title,
            "condition": doc.condition,
            "summary": doc.summary,
            "detailed_description": doc.detailed_description,
            "eligibility": doc.eligibility,
        }
    return docs

4.4 TrialPath ่พ“ๅ‡บๅˆฐ TREC ๆ ผๅผ็š„ๆ˜ ๅฐ„

def convert_trialpath_to_trec_run(
    results: dict[str, list[dict]],
    run_name: str = "trialpath-v1"
) -> str:
    """Convert TrialPath matching results to TREC run format.

    Args:
        results: {topic_id: [{"nct_id": str, "score": float}, ...]}
        run_name: Run identifier

    Returns:
        TREC-format run string
    """
    lines = []
    for topic_id, candidates in results.items():
        sorted_candidates = sorted(candidates, key=lambda x: x["score"], reverse=True)
        for rank, candidate in enumerate(sorted_candidates[:1000], 1):
            lines.append(
                f"{topic_id} Q0 {candidate['nct_id']} {rank} "
                f"{candidate['score']:.6f} {run_name}"
            )
    return "\n".join(lines)


def save_trec_run(run_str: str, output_path: str):
    """Save TREC run to file."""
    with open(output_path, 'w') as f:
        f.write(run_str)

4.5 ไฝฟ็”จ ir-measures ่ฎก็ฎ—่ฏ„ไผฐๆŒ‡ๆ ‡

# evaluation/run_trec_benchmark.py (็ปญ)
import ir_measures
from ir_measures import nDCG, P, Recall, AP, RR, SetP, SetR, SetF


def evaluate_trec_run(
    qrels_path: str,
    run_path: str,
) -> dict:
    """Evaluate a TREC run using ir-measures.

    Target metrics:
    - Recall@50 >= 0.75
    - NDCG@10 >= 0.60
    - P@10 (informational)
    """
    qrels = list(ir_measures.read_trec_qrels(qrels_path))
    run = list(ir_measures.read_trec_run(run_path))

    # ๅฎšไน‰็›ฎๆ ‡ๆŒ‡ๆ ‡
    measures = [
        nDCG@10,        # Target >= 0.60
        Recall@50,      # Target >= 0.75
        P@10,           # Precision at 10
        AP,             # Mean Average Precision
        RR,             # Reciprocal Rank
        nDCG@20,        # Additional depth
        Recall@100,     # Extended recall
    ]

    # ่ฎก็ฎ—่šๅˆๆŒ‡ๆ ‡
    aggregate = ir_measures.calc_aggregate(measures, qrels, run)

    # ่ฎก็ฎ—้€ๆŸฅ่ฏขๆŒ‡ๆ ‡
    per_query = {}
    for metric in ir_measures.iter_calc(measures, qrels, run):
        qid = metric.query_id
        if qid not in per_query:
            per_query[qid] = {}
        per_query[qid][str(metric.measure)] = metric.value

    return {
        "aggregate": {str(k): v for k, v in aggregate.items()},
        "per_query": per_query,
        "pass_fail": {
            "ndcg@10": aggregate.get(nDCG@10, 0) >= 0.60,
            "recall@50": aggregate.get(Recall@50, 0) >= 0.75,
        }
    }


def evaluate_with_eligibility_levels(
    qrels_path: str,
    run_path: str,
) -> dict:
    """Evaluate with TREC CT graded relevance (0=NR, 1=Excluded, 2=Eligible).

    Uses rel=2 for strict eligible-only evaluation.
    """
    qrels = list(ir_measures.read_trec_qrels(qrels_path))
    run = list(ir_measures.read_trec_run(run_path))

    # Standard evaluation (relevance >= 1)
    standard_measures = [nDCG@10, Recall@50, P@10]
    standard = ir_measures.calc_aggregate(standard_measures, qrels, run)

    # Strict evaluation (only eligible = relevance 2)
    strict_measures = [
        AP(rel=2),
        P(rel=2)@10,
        Recall(rel=2)@50,
    ]
    strict = ir_measures.calc_aggregate(strict_measures, qrels, run)

    return {
        "standard": {str(k): v for k, v in standard.items()},
        "strict_eligible_only": {str(k): v for k, v in strict.items()},
    }

4.6 ไฝฟ็”จ ir_datasets ็š„ๆ›ฟไปฃ qrels/run ๆ ผๅผ

def evaluate_from_dicts(
    qrels_dict: dict[str, dict[str, int]],
    run_dict: dict[str, list[tuple[str, float]]],
) -> dict:
    """Evaluate using Python dict format (no files needed).

    Args:
        qrels_dict: {query_id: {doc_id: relevance}}
        run_dict: {query_id: [(doc_id, score), ...]}
    """
    # Convert to ir-measures format
    qrels = [
        ir_measures.Qrel(qid, did, rel)
        for qid, docs in qrels_dict.items()
        for did, rel in docs.items()
    ]
    run = [
        ir_measures.ScoredDoc(qid, did, score)
        for qid, docs in run_dict.items()
        for did, score in docs
    ]

    measures = [nDCG@10, Recall@50, P@10, AP]
    aggregate = ir_measures.calc_aggregate(measures, qrels, run)
    return {str(k): v for k, v in aggregate.items()}

5. MedGemma ๆๅ–่ฏ„ไผฐ

5.1 ๆ ‡ๆณจๆ•ฐๆฎ้›†่ฎพ่ฎก

# evaluation/extraction_eval.py
from dataclasses import dataclass
from typing import Optional


@dataclass
class AnnotatedField:
    """A single annotated field with ground truth and extraction result."""
    field_name: str           # e.g., "biomarkers.egfr"
    ground_truth: Optional[str]   # From Synthea profile (gold standard)
    extracted: Optional[str]      # From MedGemma extraction
    evidence_span: Optional[str]  # Text span in source document
    source_page: Optional[int]    # Page number in PDF


@dataclass
class ExtractionAnnotation:
    """Complete annotation for one patient's extraction."""
    patient_id: str
    fields: list[AnnotatedField]
    noise_level: str  # "clean", "mild", "moderate", "severe"
    document_type: str  # "clinical_letter", "pathology_report", etc.

ๆ ‡ๆณจๆ•ฐๆฎ้›†็ป“ๆž„๏ผš

{
  "patient_id": "synth-001",
  "noise_level": "mild",
  "document_type": "clinical_letter",
  "fields": [
    {
      "field_name": "demographics.name",
      "ground_truth": "John Smith",
      "extracted": "John Smith",
      "correct": true
    },
    {
      "field_name": "diagnosis.stage",
      "ground_truth": "Stage IIIA",
      "extracted": "Stage 3A",
      "correct": true,
      "note": "Equivalent representation"
    },
    {
      "field_name": "biomarkers.egfr",
      "ground_truth": "Exon 19 deletion",
      "extracted": "EGFR positive",
      "correct": false,
      "note": "Partial extraction - missing specific mutation"
    }
  ]
}

5.2 ๅญ—ๆฎต็บง F1 ่ฎก็ฎ—

# evaluation/extraction_eval.py
from sklearn.metrics import (
    f1_score, precision_score, recall_score,
    classification_report, confusion_matrix
)
import numpy as np


# ๅฎšไน‰ๆ‰€ๆœ‰ๅฏๆๅ–ๅญ—ๆฎต
EXTRACTION_FIELDS = [
    "demographics.name",
    "demographics.sex",
    "demographics.date_of_birth",
    "demographics.age",
    "diagnosis.primary",
    "diagnosis.stage",
    "diagnosis.histology",
    "biomarkers.egfr",
    "biomarkers.alk",
    "biomarkers.pdl1_tps",
    "biomarkers.kras",
    "biomarkers.ros1",
    "labs.wbc",
    "labs.hemoglobin",
    "labs.platelets",
    "labs.creatinine",
    "labs.alt",
    "labs.ast",
    "treatments.current_regimen",
    "performance_status.ecog",
]


def compute_field_level_f1(
    annotations: list[dict],
) -> dict:
    """Compute field-level F1, precision, recall.

    For each field:
    - TP: ground_truth exists AND extracted matches
    - FP: extracted exists BUT ground_truth is None or mismatch
    - FN: ground_truth exists BUT extracted is None or mismatch

    Args:
        annotations: List of patient annotation dicts

    Returns:
        Per-field and aggregate metrics
    """
    field_metrics = {}

    for field_name in EXTRACTION_FIELDS:
        y_true = []  # 1 if field has ground truth value
        y_pred = []  # 1 if field was correctly extracted

        for ann in annotations:
            fields = {f["field_name"]: f for f in ann["fields"]}
            if field_name in fields:
                f = fields[field_name]
                has_gt = f["ground_truth"] is not None
                is_correct = f.get("correct", False)

                y_true.append(1 if has_gt else 0)
                y_pred.append(1 if is_correct else 0)

        if len(y_true) > 0:
            precision = precision_score(y_true, y_pred, zero_division=0)
            recall = recall_score(y_true, y_pred, zero_division=0)
            f1 = f1_score(y_true, y_pred, zero_division=0)
            field_metrics[field_name] = {
                "precision": round(precision, 4),
                "recall": round(recall, 4),
                "f1": round(f1, 4),
                "support": sum(y_true),
            }

    # Aggregate metrics
    all_y_true = []
    all_y_pred = []
    for ann in annotations:
        for f in ann["fields"]:
            has_gt = f["ground_truth"] is not None
            is_correct = f.get("correct", False)
            all_y_true.append(1 if has_gt else 0)
            all_y_pred.append(1 if is_correct else 0)

    micro_f1 = f1_score(all_y_true, all_y_pred, zero_division=0)
    macro_f1 = np.mean([m["f1"] for m in field_metrics.values()])

    return {
        "per_field": field_metrics,
        "micro_f1": round(micro_f1, 4),
        "macro_f1": round(macro_f1, 4),
        "total_fields": len(all_y_true),
        "pass": micro_f1 >= 0.85,  # Target: F1 >= 0.85
    }


def compute_extraction_report(annotations: list[dict]) -> str:
    """Generate a scikit-learn classification_report style output."""
    all_y_true = []
    all_y_pred = []
    labels = []

    for field_name in EXTRACTION_FIELDS:
        for ann in annotations:
            fields = {f["field_name"]: f for f in ann["fields"]}
            if field_name in fields:
                f = fields[field_name]
                has_gt = f["ground_truth"] is not None
                is_correct = f.get("correct", False)
                all_y_true.append(1 if has_gt else 0)
                all_y_pred.append(1 if is_correct else 0)

    return classification_report(
        all_y_true, all_y_pred,
        target_names=["absent", "present/correct"],
        digits=4,
    )


def compare_with_baseline(
    medgemma_annotations: list[dict],
    gemini_only_annotations: list[dict],
) -> dict:
    """Compare MedGemma extraction vs Gemini-only baseline."""
    medgemma_metrics = compute_field_level_f1(medgemma_annotations)
    gemini_metrics = compute_field_level_f1(gemini_only_annotations)

    comparison = {}
    for field_name in EXTRACTION_FIELDS:
        mg = medgemma_metrics["per_field"].get(field_name, {})
        gm = gemini_metrics["per_field"].get(field_name, {})
        comparison[field_name] = {
            "medgemma_f1": mg.get("f1", 0),
            "gemini_f1": gm.get("f1", 0),
            "delta": round(mg.get("f1", 0) - gm.get("f1", 0), 4),
        }

    return {
        "per_field_comparison": comparison,
        "medgemma_overall_f1": medgemma_metrics["micro_f1"],
        "gemini_overall_f1": gemini_metrics["micro_f1"],
        "improvement": round(
            medgemma_metrics["micro_f1"] - gemini_metrics["micro_f1"], 4
        ),
    }

5.3 ๅ™ชๅฃฐ็บงๅˆซๅฏนๆๅ–ๆ€ง่ƒฝ็š„ๅฝฑๅ“ๅˆ†ๆž

def analyze_noise_impact(annotations: list[dict]) -> dict:
    """Analyze how noise level affects extraction F1."""
    by_noise = {}
    for ann in annotations:
        level = ann["noise_level"]
        if level not in by_noise:
            by_noise[level] = []
        by_noise[level].append(ann)

    results = {}
    for level, level_anns in by_noise.items():
        metrics = compute_field_level_f1(level_anns)
        results[level] = {
            "micro_f1": metrics["micro_f1"],
            "macro_f1": metrics["macro_f1"],
            "n_patients": len(level_anns),
        }

    return results

6. ็ซฏๅˆฐ็ซฏ่ฏ„ไผฐ็ฎก็บฟ

6.1 Criterion Decision Accuracy

# evaluation/criterion_eval.py

def compute_criterion_accuracy(
    predictions: list[dict],
    ground_truth: list[dict],
) -> dict:
    """Compute criterion-level decision accuracy.

    Each prediction/ground_truth entry:
    {
        "patient_id": str,
        "trial_id": str,
        "criteria": [
            {"criterion_id": str, "decision": "met"|"not_met"|"unknown",
             "evidence": str}
        ]
    }

    Target: >= 0.85
    """
    total = 0
    correct = 0
    by_decision_type = {"met": {"tp": 0, "total": 0},
                        "not_met": {"tp": 0, "total": 0},
                        "unknown": {"tp": 0, "total": 0}}

    for pred, gt in zip(predictions, ground_truth):
        assert pred["patient_id"] == gt["patient_id"]
        assert pred["trial_id"] == gt["trial_id"]

        gt_map = {c["criterion_id"]: c["decision"] for c in gt["criteria"]}

        for criterion in pred["criteria"]:
            cid = criterion["criterion_id"]
            if cid in gt_map:
                total += 1
                gt_decision = gt_map[cid]
                pred_decision = criterion["decision"]
                by_decision_type[gt_decision]["total"] += 1
                if pred_decision == gt_decision:
                    correct += 1
                    by_decision_type[gt_decision]["tp"] += 1

    accuracy = correct / total if total > 0 else 0.0

    return {
        "overall_accuracy": round(accuracy, 4),
        "total_criteria": total,
        "correct": correct,
        "pass": accuracy >= 0.85,
        "by_decision_type": {
            k: {
                "accuracy": round(v["tp"] / v["total"], 4) if v["total"] > 0 else 0,
                "support": v["total"],
            }
            for k, v in by_decision_type.items()
        },
    }

6.2 ๅปถ่ฟŸๅŸบๅ‡†ๆต‹่ฏ•

# evaluation/latency_cost_tracker.py
import time
import json
from dataclasses import dataclass, field, asdict
from typing import Optional
from contextlib import contextmanager


@dataclass
class APICallRecord:
    """Record of a single API call."""
    service: str       # "medgemma", "gemini", "clinicaltrials_mcp"
    operation: str     # "extract", "search", "evaluate_criterion"
    latency_ms: float
    input_tokens: int = 0
    output_tokens: int = 0
    cost_usd: float = 0.0
    timestamp: str = ""


@dataclass
class SessionMetrics:
    """Aggregate metrics for a patient matching session."""
    patient_id: str
    total_latency_ms: float = 0.0
    total_cost_usd: float = 0.0
    api_calls: list[APICallRecord] = field(default_factory=list)

    @property
    def total_latency_s(self) -> float:
        return self.total_latency_ms / 1000.0

    @property
    def pass_latency(self) -> bool:
        """Target: < 15s per session."""
        return self.total_latency_s < 15.0

    @property
    def pass_cost(self) -> bool:
        """Target: < $0.50 per session."""
        return self.total_cost_usd < 0.50


class LatencyCostTracker:
    """Track latency and cost across API calls."""

    # Pricing per 1M tokens (approximate)
    PRICING = {
        "medgemma": {"input": 0.0, "output": 0.0},  # Self-hosted
        "gemini": {"input": 1.25, "output": 5.00},   # Gemini Pro
        "clinicaltrials_mcp": {"input": 0.0, "output": 0.0},  # Free API
    }

    def __init__(self):
        self.sessions: list[SessionMetrics] = []
        self._current_session: Optional[SessionMetrics] = None

    def start_session(self, patient_id: str):
        self._current_session = SessionMetrics(patient_id=patient_id)

    def end_session(self) -> SessionMetrics:
        session = self._current_session
        if session:
            session.total_latency_ms = sum(c.latency_ms for c in session.api_calls)
            session.total_cost_usd = sum(c.cost_usd for c in session.api_calls)
            self.sessions.append(session)
        self._current_session = None
        return session

    @contextmanager
    def track_call(self, service: str, operation: str):
        """Context manager to track an API call."""
        start = time.monotonic()
        record = APICallRecord(service=service, operation=operation, latency_ms=0)
        try:
            yield record
        finally:
            record.latency_ms = (time.monotonic() - start) * 1000
            # Compute cost
            pricing = self.PRICING.get(service, {"input": 0, "output": 0})
            record.cost_usd = (
                record.input_tokens * pricing["input"] / 1_000_000
                + record.output_tokens * pricing["output"] / 1_000_000
            )
            if self._current_session:
                self._current_session.api_calls.append(record)

    def summary(self) -> dict:
        """Generate aggregate summary across all sessions."""
        if not self.sessions:
            return {}

        latencies = [s.total_latency_s for s in self.sessions]
        costs = [s.total_cost_usd for s in self.sessions]

        return {
            "n_sessions": len(self.sessions),
            "latency": {
                "mean_s": round(sum(latencies) / len(latencies), 2),
                "p50_s": round(sorted(latencies)[len(latencies) // 2], 2),
                "p95_s": round(sorted(latencies)[int(len(latencies) * 0.95)], 2),
                "max_s": round(max(latencies), 2),
                "pass_rate": round(
                    sum(1 for s in self.sessions if s.pass_latency) / len(self.sessions), 4
                ),
            },
            "cost": {
                "mean_usd": round(sum(costs) / len(costs), 4),
                "total_usd": round(sum(costs), 4),
                "max_usd": round(max(costs), 4),
                "pass_rate": round(
                    sum(1 for s in self.sessions if s.pass_cost) / len(self.sessions), 4
                ),
            },
            "targets": {
                "latency_pass": all(s.pass_latency for s in self.sessions),
                "cost_pass": all(s.pass_cost for s in self.sessions),
            },
        }

7. TDD ๆต‹่ฏ•็”จไพ‹

7.1 Synthea ๆ•ฐๆฎ้ชŒ่ฏๆต‹่ฏ•

# tests/test_synthea_data.py
import pytest
import json
from pathlib import Path

# ้ข„ๆœŸ็š„ FHIR Resource ็ฑปๅž‹
REQUIRED_RESOURCE_TYPES = {"Patient", "Condition", "Observation", "Encounter"}


class TestSyntheaDataValidation:
    """Validate Synthea FHIR output for TrialPath requirements."""

    def test_fhir_bundle_is_valid_json(self, fhir_file):
        """Bundle must be valid JSON."""
        with open(fhir_file) as f:
            data = json.load(f)
        assert data["resourceType"] == "Bundle"
        assert "entry" in data

    def test_bundle_contains_required_resources(self, fhir_file):
        """Bundle must contain Patient, Condition, Observation, Encounter."""
        with open(fhir_file) as f:
            bundle = json.load(f)
        resource_types = {
            e["resource"]["resourceType"] for e in bundle["entry"]
        }
        for rt in REQUIRED_RESOURCE_TYPES:
            assert rt in resource_types, f"Missing {rt} resource"

    def test_patient_has_demographics(self, fhir_file):
        """Patient resource must have name, gender, birthDate."""
        with open(fhir_file) as f:
            bundle = json.load(f)
        patients = [
            e["resource"] for e in bundle["entry"]
            if e["resource"]["resourceType"] == "Patient"
        ]
        assert len(patients) == 1
        patient = patients[0]
        assert "name" in patient
        assert "gender" in patient
        assert "birthDate" in patient

    def test_lung_cancer_condition_present(self, fhir_file):
        """At least one Condition must be NSCLC or lung cancer."""
        with open(fhir_file) as f:
            bundle = json.load(f)
        conditions = [
            e["resource"] for e in bundle["entry"]
            if e["resource"]["resourceType"] == "Condition"
        ]
        lung_cancer_codes = {"254637007", "254632001", "162573006"}
        has_lung_cancer = False
        for cond in conditions:
            codings = cond.get("code", {}).get("coding", [])
            for c in codings:
                if c.get("code") in lung_cancer_codes:
                    has_lung_cancer = True
        assert has_lung_cancer, "No lung cancer Condition found"

    def test_patient_profile_conversion(self, fhir_file):
        """FHIR Bundle must convert to valid PatientProfile."""
        profile = parse_fhir_bundle(Path(fhir_file))
        assert profile.patient_id != ""
        assert profile.demographics.name != ""
        assert profile.demographics.sex in ("male", "female")
        assert profile.diagnosis.primary != ""

    def test_batch_generation_produces_500_patients(self, output_dir):
        """Batch generation must produce at least 500 FHIR files."""
        fhir_files = list(Path(output_dir).glob("*.json"))
        assert len(fhir_files) >= 500

    def test_nsclc_ratio(self, all_profiles):
        """~85% of lung cancer patients should be NSCLC."""
        nsclc_count = sum(
            1 for p in all_profiles
            if "non-small cell" in p.diagnosis.primary.lower()
            or "nsclc" in p.diagnosis.primary.lower()
        )
        ratio = nsclc_count / len(all_profiles)
        assert 0.70 <= ratio <= 0.95, f"NSCLC ratio {ratio} outside expected range"

7.2 PDF ็”Ÿๆˆๆญฃ็กฎๆ€งๆต‹่ฏ•

# tests/test_pdf_generation.py
import pytest
from pathlib import Path
from data.templates.clinical_letter import generate_clinical_letter
from data.templates.pathology_report import generate_pathology_report
from data.templates.lab_report import generate_lab_report


class TestPDFGeneration:
    """Test that PDF generation produces valid documents."""

    SAMPLE_PROFILE = {
        "patient_id": "test-001",
        "demographics": {
            "name": "Jane Doe",
            "sex": "female",
            "date_of_birth": "1960-05-15",
        },
        "diagnosis": {
            "primary": "Non-small cell lung cancer, adenocarcinoma",
            "stage": "Stage IIIA",
            "histology": "adenocarcinoma",
            "diagnosis_date": "2024-01-15",
        },
        "biomarkers": {
            "egfr": "Exon 19 deletion",
            "alk": "Negative",
            "pdl1_tps": "60%",
            "kras": None,
        },
        "labs": [
            {"name": "WBC", "value": 7.2, "unit": "10*3/uL", "date": "2024-01-10", "loinc_code": "6690-2"},
            {"name": "Hemoglobin", "value": 12.5, "unit": "g/dL", "date": "2024-01-10", "loinc_code": "718-7"},
        ],
        "treatments": [
            {"name": "Cisplatin", "type": "medication", "start_date": "2024-02-01"},
        ],
    }

    def test_clinical_letter_generates_pdf(self, tmp_path):
        """Clinical letter must generate a non-empty PDF file."""
        output = tmp_path / "letter.pdf"
        generate_clinical_letter(self.SAMPLE_PROFILE, str(output))
        assert output.exists()
        assert output.stat().st_size > 0

    def test_pathology_report_generates_pdf(self, tmp_path):
        """Pathology report must generate a non-empty PDF file."""
        output = tmp_path / "pathology.pdf"
        generate_pathology_report(self.SAMPLE_PROFILE, str(output))
        assert output.exists()
        assert output.stat().st_size > 0

    def test_lab_report_generates_pdf(self, tmp_path):
        """Lab report must generate a non-empty PDF file."""
        output = tmp_path / "lab.pdf"
        generate_lab_report(self.SAMPLE_PROFILE, str(output))
        assert output.exists()
        assert output.stat().st_size > 0

    def test_pdf_contains_patient_name(self, tmp_path):
        """Generated PDF must contain patient name (OCR-verifiable)."""
        output = tmp_path / "letter.pdf"
        generate_clinical_letter(self.SAMPLE_PROFILE, str(output))
        # Read PDF text (using pdfplumber or PyPDF2)
        import pdfplumber
        with pdfplumber.open(str(output)) as pdf:
            text = ""
            for page in pdf.pages:
                text += page.extract_text() or ""
        assert "Jane Doe" in text

    def test_pdf_contains_biomarkers(self, tmp_path):
        """Generated PDF must contain biomarker results."""
        output = tmp_path / "pathology.pdf"
        generate_pathology_report(self.SAMPLE_PROFILE, str(output))
        import pdfplumber
        with pdfplumber.open(str(output)) as pdf:
            text = ""
            for page in pdf.pages:
                text += page.extract_text() or ""
        assert "EGFR" in text
        assert "Exon 19" in text or "positive" in text.lower()

    def test_missing_biomarker_handled_gracefully(self, tmp_path):
        """PDF generation should not crash when biomarkers are None."""
        profile = self.SAMPLE_PROFILE.copy()
        profile["biomarkers"] = {
            "egfr": None, "alk": None, "pdl1_tps": None, "kras": None
        }
        output = tmp_path / "letter.pdf"
        generate_clinical_letter(profile, str(output))
        assert output.exists()

7.3 ๅ™ชๅฃฐๆณจๅ…ฅๆ•ˆๆžœ้ชŒ่ฏๆต‹่ฏ•

# tests/test_noise_injection.py
import pytest
from data.noise.noise_injector import NoiseInjector


class TestNoiseInjection:
    """Test noise injection produces expected results."""

    def test_clean_noise_no_changes(self):
        """Clean level should produce no changes."""
        injector = NoiseInjector(noise_level="clean", seed=42)
        text = "Patient has EGFR mutation positive"
        noisy, records = injector.inject_text_noise(text)
        assert noisy == text
        assert len(records) == 0

    def test_mild_noise_produces_some_changes(self):
        """Mild noise should produce some but limited changes."""
        injector = NoiseInjector(noise_level="mild", seed=42)
        # Use longer text to increase chance of noise
        text = "The patient is a 65 year old male with stage IIIA " * 10
        noisy, records = injector.inject_text_noise(text)
        # Should have some changes but not too many
        assert len(records) >= 0  # May or may not have changes depending on seed

    def test_severe_noise_produces_many_changes(self):
        """Severe noise should produce noticeable changes."""
        injector = NoiseInjector(noise_level="severe", seed=42)
        text = "The 50 year old patient has stage 1 NSCLC " * 20
        noisy, records = injector.inject_text_noise(text)
        assert noisy != text  # Should differ from original
        assert len(records) > 0

    def test_ocr_error_types_are_valid(self):
        """OCR errors should only substitute known character pairs."""
        injector = NoiseInjector(noise_level="severe", seed=42)
        text = "0123456789 OIBS" * 10
        _, records = injector.inject_text_noise(text)
        for r in records:
            if r["type"] == "ocr_error":
                assert r["original"] in NoiseInjector.OCR_ERROR_MAP
                assert r["replacement"] in NoiseInjector.OCR_ERROR_MAP[r["original"]]

    def test_missing_value_injection(self):
        """Missing value injection should remove some fields."""
        injector = NoiseInjector(noise_level="moderate", seed=42)
        profile = {
            "biomarkers": {"egfr": "positive", "alk": "negative",
                          "pdl1_tps": "60%", "kras": "negative", "ros1": "negative"},
            "diagnosis": {"stage": "IIIA", "histology": "adenocarcinoma"},
        }
        modified, removed = injector.inject_missing_values(profile)
        # At 10% rate with 7 fields, expect 0-3 removals
        assert len(removed) <= 7
        for field_path in removed:
            section, field_name = field_path.split(".")
            assert modified[section][field_name] is None

    def test_noise_is_deterministic_with_seed(self):
        """Same seed should produce identical results."""
        text = "Patient has stage IIIA non-small cell lung cancer"
        inj1 = NoiseInjector(noise_level="moderate", seed=123)
        inj2 = NoiseInjector(noise_level="moderate", seed=123)
        noisy1, _ = inj1.inject_text_noise(text)
        noisy2, _ = inj2.inject_text_noise(text)
        assert noisy1 == noisy2

    def test_different_seeds_produce_different_results(self):
        """Different seeds should generally produce different noise."""
        text = "The 50 year old patient has 10 biomarker tests 0 1 5 8" * 20
        inj1 = NoiseInjector(noise_level="severe", seed=1)
        inj2 = NoiseInjector(noise_level="severe", seed=999)
        noisy1, _ = inj1.inject_text_noise(text)
        noisy2, _ = inj2.inject_text_noise(text)
        # With severe noise on long text, different seeds should differ
        assert noisy1 != noisy2

7.4 TREC ่ฏ„ไผฐ่ฎก็ฎ—ๆต‹่ฏ•

# tests/test_trec_evaluation.py
import pytest
import ir_measures
from ir_measures import nDCG, Recall, P, AP


class TestTRECEvaluation:
    """Test TREC evaluation metric computation."""

    @pytest.fixture
    def sample_qrels(self):
        """Sample qrels with known ground truth."""
        return [
            ir_measures.Qrel("q1", "d1", 2),  # eligible
            ir_measures.Qrel("q1", "d2", 1),  # excluded
            ir_measures.Qrel("q1", "d3", 0),  # not relevant
            ir_measures.Qrel("q1", "d4", 2),  # eligible
            ir_measures.Qrel("q1", "d5", 0),  # not relevant
        ]

    @pytest.fixture
    def perfect_run(self):
        """Run that ranks all relevant docs at top."""
        return [
            ir_measures.ScoredDoc("q1", "d1", 1.0),
            ir_measures.ScoredDoc("q1", "d4", 0.9),
            ir_measures.ScoredDoc("q1", "d2", 0.8),
            ir_measures.ScoredDoc("q1", "d3", 0.1),
            ir_measures.ScoredDoc("q1", "d5", 0.05),
        ]

    @pytest.fixture
    def worst_run(self):
        """Run that ranks relevant docs at bottom."""
        return [
            ir_measures.ScoredDoc("q1", "d3", 1.0),
            ir_measures.ScoredDoc("q1", "d5", 0.9),
            ir_measures.ScoredDoc("q1", "d2", 0.5),
            ir_measures.ScoredDoc("q1", "d4", 0.2),
            ir_measures.ScoredDoc("q1", "d1", 0.1),
        ]

    def test_perfect_ndcg_at_10(self, sample_qrels, perfect_run):
        """Perfect ranking should yield NDCG@10 = 1.0."""
        result = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run)
        assert result[nDCG@10] == pytest.approx(1.0, abs=0.01)

    def test_worst_ndcg_lower(self, sample_qrels, perfect_run, worst_run):
        """Worst ranking should yield lower NDCG than perfect."""
        perfect = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run)
        worst = ir_measures.calc_aggregate([nDCG@10], sample_qrels, worst_run)
        assert worst[nDCG@10] < perfect[nDCG@10]

    def test_recall_at_50_perfect(self, sample_qrels, perfect_run):
        """Perfect run should retrieve all relevant docs."""
        result = ir_measures.calc_aggregate([Recall@50], sample_qrels, perfect_run)
        assert result[Recall@50] == pytest.approx(1.0, abs=0.01)

    def test_empty_run_yields_zero(self, sample_qrels):
        """Empty run should yield 0 for all metrics."""
        empty_run = []
        result = ir_measures.calc_aggregate(
            [nDCG@10, Recall@50, P@10], sample_qrels, empty_run
        )
        assert result[nDCG@10] == 0.0
        assert result[Recall@50] == 0.0
        assert result[P@10] == 0.0

    def test_per_query_results(self, sample_qrels, perfect_run):
        """Per-query results should return one entry per query."""
        results = list(ir_measures.iter_calc(
            [nDCG@10], sample_qrels, perfect_run
        ))
        assert len(results) == 1  # Only q1
        assert results[0].query_id == "q1"

    def test_trec_run_format_conversion(self):
        """Test TrialPath results to TREC format conversion."""
        results = {
            "1": [
                {"nct_id": "NCT001", "score": 0.95},
                {"nct_id": "NCT002", "score": 0.80},
            ]
        }
        run_str = convert_trialpath_to_trec_run(results, "test-run")
        lines = run_str.strip().split("\n")
        assert len(lines) == 2
        assert "NCT001" in lines[0]
        assert "1" == lines[0].split()[3]  # rank 1
        assert "2" == lines[1].split()[3]  # rank 2

    def test_graded_relevance_evaluation(self, sample_qrels, perfect_run):
        """Test strict eligible-only evaluation (rel=2)."""
        strict = ir_measures.calc_aggregate(
            [AP(rel=2)], sample_qrels, perfect_run
        )
        assert strict[AP(rel=2)] > 0.0

    def test_qrels_dict_format(self):
        """Test evaluation from dict format."""
        qrels = {"q1": {"d1": 2, "d2": 1, "d3": 0}}
        run = [
            ir_measures.ScoredDoc("q1", "d1", 1.0),
            ir_measures.ScoredDoc("q1", "d2", 0.5),
            ir_measures.ScoredDoc("q1", "d3", 0.1),
        ]
        result = ir_measures.calc_aggregate([nDCG@10], qrels, run)
        assert nDCG@10 in result

7.5 F1 ่ฎก็ฎ—ๆต‹่ฏ•

# tests/test_extraction_f1.py
import pytest
from evaluation.extraction_eval import compute_field_level_f1


class TestExtractionF1:
    """Test F1 computation for field-level extraction."""

    def test_perfect_extraction(self):
        """All fields correctly extracted should yield F1=1.0."""
        annotations = [{
            "patient_id": "p1",
            "noise_level": "clean",
            "document_type": "clinical_letter",
            "fields": [
                {"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True},
                {"field_name": "demographics.sex", "ground_truth": "male", "extracted": "male", "correct": True},
                {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "NSCLC", "correct": True},
                {"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True},
            ]
        }]
        result = compute_field_level_f1(annotations)
        assert result["micro_f1"] == 1.0
        assert result["pass"] is True

    def test_zero_extraction(self):
        """No correct extractions should yield F1=0."""
        annotations = [{
            "patient_id": "p1",
            "noise_level": "clean",
            "document_type": "clinical_letter",
            "fields": [
                {"field_name": "demographics.name", "ground_truth": "John", "extracted": "Jane", "correct": False},
                {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": None, "correct": False},
            ]
        }]
        result = compute_field_level_f1(annotations)
        assert result["micro_f1"] == 0.0
        assert result["pass"] is False

    def test_partial_extraction(self):
        """Partial extraction should yield 0 < F1 < 1."""
        annotations = [{
            "patient_id": "p1",
            "noise_level": "mild",
            "document_type": "clinical_letter",
            "fields": [
                {"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True},
                {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "lung ca", "correct": False},
                {"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True},
                {"field_name": "biomarkers.alk", "ground_truth": "negative", "extracted": None, "correct": False},
            ]
        }]
        result = compute_field_level_f1(annotations)
        assert 0.0 < result["micro_f1"] < 1.0

    def test_f1_threshold_boundary(self):
        """F1 exactly at 0.85 should pass."""
        # Create annotations that produce exactly 0.85 F1
        fields = []
        for i in range(85):
            fields.append({"field_name": f"field_{i}", "ground_truth": "val", "extracted": "val", "correct": True})
        for i in range(15):
            fields.append({"field_name": f"field_miss_{i}", "ground_truth": "val", "extracted": None, "correct": False})

        annotations = [{"patient_id": "p1", "noise_level": "clean",
                        "document_type": "test", "fields": fields}]
        result = compute_field_level_f1(annotations)
        # With 85/100 correct, F1 should be ~0.85
        assert result["pass"] is True

    def test_empty_annotations(self):
        """Empty annotations should not crash."""
        result = compute_field_level_f1([])
        assert result["micro_f1"] == 0.0

    def test_none_ground_truth_not_counted(self):
        """Fields with None ground truth should be handled."""
        annotations = [{
            "patient_id": "p1",
            "noise_level": "clean",
            "document_type": "test",
            "fields": [
                {"field_name": "biomarkers.ros1", "ground_truth": None,
                 "extracted": None, "correct": False},
            ]
        }]
        result = compute_field_level_f1(annotations)
        # Should not crash, though metrics may be 0
        assert "micro_f1" in result

7.6 ็ซฏๅˆฐ็ซฏ็ฎก็บฟๆต‹่ฏ•

# tests/test_e2e_pipeline.py
import pytest
from pathlib import Path


class TestE2EPipeline:
    """End-to-end tests for the complete data & evaluation pipeline."""

    def test_fhir_to_profile_to_pdf_roundtrip(self, sample_fhir_file, tmp_path):
        """FHIR โ†’ PatientProfile โ†’ PDF should complete without error."""
        from data.generate_synthetic_patients import parse_fhir_bundle
        from data.templates.clinical_letter import generate_clinical_letter
        from dataclasses import asdict

        # Step 1: Parse FHIR
        profile = parse_fhir_bundle(Path(sample_fhir_file))
        assert profile.patient_id != ""

        # Step 2: Generate PDF
        pdf_path = tmp_path / "test_roundtrip.pdf"
        generate_clinical_letter(asdict(profile), str(pdf_path))
        assert pdf_path.exists()
        assert pdf_path.stat().st_size > 1000  # Reasonable PDF size

    def test_noisy_pdf_pipeline(self, sample_profile, tmp_path):
        """Profile โ†’ Noisy PDF should inject noise and produce valid PDF."""
        from data.templates.clinical_letter import generate_clinical_letter
        from data.noise.noise_injector import NoiseInjector

        injector = NoiseInjector(noise_level="moderate", seed=42)

        # Inject text noise into profile fields for PDF rendering
        profile = sample_profile.copy()
        dx_text = profile["diagnosis"]["primary"]
        noisy_dx, records = injector.inject_text_noise(dx_text)
        profile["diagnosis"]["primary"] = noisy_dx

        pdf_path = tmp_path / "noisy.pdf"
        generate_clinical_letter(profile, str(pdf_path))
        assert pdf_path.exists()

    def test_trec_evaluation_pipeline(self, tmp_path):
        """Complete TREC evaluation from dicts should produce metrics."""
        import ir_measures
        from ir_measures import nDCG, Recall, P

        qrels = [
            ir_measures.Qrel("1", "NCT001", 2),
            ir_measures.Qrel("1", "NCT002", 1),
            ir_measures.Qrel("1", "NCT003", 0),
        ]
        run = [
            ir_measures.ScoredDoc("1", "NCT001", 0.9),
            ir_measures.ScoredDoc("1", "NCT002", 0.5),
            ir_measures.ScoredDoc("1", "NCT003", 0.1),
        ]

        result = ir_measures.calc_aggregate(
            [nDCG@10, Recall@50, P@10], qrels, run
        )
        assert nDCG@10 in result
        assert Recall@50 in result
        assert result[nDCG@10] > 0

    def test_latency_tracker_integration(self):
        """Latency tracker should record and summarize calls."""
        import time
        from evaluation.latency_cost_tracker import LatencyCostTracker

        tracker = LatencyCostTracker()
        tracker.start_session("test-patient")

        with tracker.track_call("gemini", "search_anchors") as record:
            time.sleep(0.01)  # Simulate API call
            record.input_tokens = 500
            record.output_tokens = 200

        session = tracker.end_session()
        assert session.total_latency_ms > 0
        assert len(session.api_calls) == 1

        summary = tracker.summary()
        assert summary["n_sessions"] == 1
        assert summary["latency"]["mean_s"] > 0

8. ้™„ๅฝ•

8.1 ๆ•ฐๆฎๆ ผๅผ่ง„่Œƒ

PatientProfile v1 JSON Schema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["patient_id", "demographics", "diagnosis"],
  "properties": {
    "patient_id": {"type": "string"},
    "demographics": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "sex": {"type": "string", "enum": ["male", "female"]},
        "date_of_birth": {"type": "string", "format": "date"},
        "age": {"type": "integer"},
        "state": {"type": "string"}
      }
    },
    "diagnosis": {
      "type": "object",
      "properties": {
        "primary": {"type": "string"},
        "stage": {"type": ["string", "null"]},
        "histology": {"type": ["string", "null"]},
        "diagnosis_date": {"type": "string", "format": "date"}
      }
    },
    "biomarkers": {
      "type": "object",
      "properties": {
        "egfr": {"type": ["string", "null"]},
        "alk": {"type": ["string", "null"]},
        "pdl1_tps": {"type": ["string", "null"]},
        "kras": {"type": ["string", "null"]},
        "ros1": {"type": ["string", "null"]}
      }
    },
    "labs": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "value": {"type": "number"},
          "unit": {"type": "string"},
          "date": {"type": "string"},
          "loinc_code": {"type": "string"}
        }
      }
    },
    "treatments": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "type": {"type": "string", "enum": ["medication", "procedure", "radiation"]},
          "start_date": {"type": "string"},
          "end_date": {"type": ["string", "null"]}
        }
      }
    },
    "unknowns": {"type": "array", "items": {"type": "string"}},
    "evidence_spans": {"type": "array"}
  }
}

8.2 ๅทฅๅ…ท API ๅ‚่€ƒ

ir_datasets

API ่ฏดๆ˜Ž ่ฟ”ๅ›ž็ฑปๅž‹
ir_datasets.load("clinicaltrials/2021/trec-ct-2021") ๅŠ ่ฝฝ TREC CT 2021 ๆ•ฐๆฎ้›† Dataset
dataset.queries_iter() ้ๅކ topics GenericQuery(query_id, text)
dataset.qrels_iter() ้ๅކ qrels TrecQrel(query_id, doc_id, relevance, iteration)
dataset.docs_iter() ้ๅކๆ–‡ๆกฃ ClinicalTrialsDoc(doc_id, title, condition, summary, detailed_description, eligibility)

ๆ•ฐๆฎ้›† ID๏ผš

  • clinicaltrials/2021/trec-ct-2021 โ€” 75 queries, 35,832 qrels
  • clinicaltrials/2021/trec-ct-2022 โ€” 50 queries
  • clinicaltrials/2021 โ€” 376K ๆ–‡ๆกฃ๏ผˆๅŸบ็ก€้›†๏ผ‰

ir-measures

API ่ฏดๆ˜Ž
ir_measures.calc_aggregate(measures, qrels, run) ่ฎก็ฎ—่šๅˆๆŒ‡ๆ ‡
ir_measures.iter_calc(measures, qrels, run) ้€ๆŸฅ่ฏขๆŒ‡ๆ ‡่ฟญไปฃ
ir_measures.read_trec_qrels(path) ่ฏปๅ– TREC qrels ๆ–‡ไปถ
ir_measures.read_trec_run(path) ่ฏปๅ– TREC run ๆ–‡ไปถ
ir_measures.Qrel(qid, did, rel) ๅˆ›ๅปบ qrel ่ฎฐๅฝ•
ir_measures.ScoredDoc(qid, did, score) ๅˆ›ๅปบ่ฏ„ๅˆ†ๆ–‡ๆกฃ่ฎฐๅฝ•

ๆŒ‡ๆ ‡ๅฏน่ฑก๏ผš

  • nDCG@10 โ€” Normalized DCG at cutoff 10
  • Recall@50 โ€” Recall at cutoff 50
  • P@10 โ€” Precision at cutoff 10
  • AP โ€” Average Precision
  • AP(rel=2) โ€” AP with minimum relevance 2
  • RR โ€” Reciprocal Rank

scikit-learn ่ฏ„ไผฐ

API ่ฏดๆ˜Ž
f1_score(y_true, y_pred, average=None) ้€็ฑปๅˆซ F1
f1_score(y_true, y_pred, average='micro') ๅ…จๅฑ€ micro F1
f1_score(y_true, y_pred, average='macro') ้€็ฑปๅˆซๅนณๅ‡ F1
precision_score(y_true, y_pred) Precision
recall_score(y_true, y_pred) Recall
classification_report(y_true, y_pred) ๅฎŒๆ•ดๅˆ†็ฑปๆŠฅๅ‘Š
confusion_matrix(y_true, y_pred) ๆททๆท†็Ÿฉ้˜ต

Synthea CLI

ๅ‚ๆ•ฐ ่ฏดๆ˜Ž ็คบไพ‹
-p N ็”Ÿๆˆ N ไธชๆ‚ฃ่€… -p 500
-s SEED ้šๆœบ็งๅญ -s 42
-m MODULE ๆŒ‡ๅฎš็–พ็—…ๆจกๅ— -m lung_cancer
STATE ๆŒ‡ๅฎšๅทž Massachusetts
--exporter.fhir.export ๅฏ็”จ FHIR R4 ๅฏผๅ‡บ =true
--exporter.pretty_print ็พŽๅŒ– JSON ่พ“ๅ‡บ =true

ReportLab ๆ ธๅฟƒ API

็ป„ไปถ ่ฏดๆ˜Ž
SimpleDocTemplate(path, pagesize=letter) ๅˆ›ๅปบๆ–‡ๆกฃๆจกๆฟ
Paragraph(text, style) ๆฎต่ฝๆตๅผ็ป„ไปถ
Table(data, colWidths) ่กจๆ ผๆตๅผ็ป„ไปถ
TableStyle(commands) ่กจๆ ผๆ ทๅผ
Spacer(width, height) ้—ด่ท็ป„ไปถ
getSampleStyleSheet() ่Žทๅ–้ป˜่ฎคๆ ทๅผ่กจ

Augraphy ้™่ดจ็ฎก็บฟ

็ป„ไปถ ่ฏดๆ˜Ž
AugraphyPipeline(ink_phase, paper_phase, post_phase) ๅฎŒๆ•ด้™่ดจ็ฎก็บฟ
InkBleed(p=0.5) ๅขจๆฐดๆธ—้€ๆ•ˆๆžœ
Letterpress(p=0.3) ๆดป็‰ˆๅฐๅˆทๆ•ˆๆžœ
LowInkPeriodicLines(p=0.3) ไฝŽๅขจๆฐดๅ‘จๆœŸๆ€ง็บฟๆก
DirtyDrum(p=0.3) ่„้ผ“ๆ•ˆๆžœ
SubtleNoise(p=0.5) ๅพฎๅ™ชๅฃฐ
Jpeg(p=0.5) JPEG ๅŽ‹็ผฉไผชๅฝฑ
Brightness(p=0.5) ไบฎๅบฆๅ˜ๅŒ–

8.3 Python ไพ่ต–ๆธ…ๅ•

# requirements-data-eval.txt
ir-datasets>=0.5.6
ir-measures>=0.3.1
reportlab>=4.0
augraphy>=8.0
Pillow>=10.0
pdfplumber>=0.10
scikit-learn>=1.3
numpy>=1.24
pandas>=2.0
pdf2image>=1.16

8.4 ๆˆๅŠŸๆŒ‡ๆ ‡้€ŸๆŸฅ่กจ

ๆŒ‡ๆ ‡ ็›ฎๆ ‡ๅ€ผ ่ฏ„ไผฐๅทฅๅ…ท ๆ•ฐๆฎๆบ
MedGemma Extraction F1 >= 0.85 scikit-learn f1_score ๅˆๆˆๆ‚ฃ่€… + Ground Truth
Trial Retrieval Recall@50 >= 0.75 ir-measures Recall@50 TREC CT 2021/2022
Trial Ranking NDCG@10 >= 0.60 ir-measures nDCG@10 TREC CT 2021/2022
Criterion Decision Accuracy >= 0.85 Custom accuracy ๆ ‡ๆณจ EligibilityLedger
Latency < 15s LatencyCostTracker API call timing
Cost < $0.50/session LatencyCostTracker Token counting