TheQuantEd commited on
Commit
b40cc1f
Β·
1 Parent(s): 89f66fe

Fix: Neo4j 5.26.0 (APOC available) + correct graphrag schema from seeder

Browse files
Files changed (2) hide show
  1. Dockerfile +1 -1
  2. backend/graphrag.py +29 -13
Dockerfile CHANGED
@@ -44,7 +44,7 @@ RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
44
  && rm -rf /var/lib/apt/lists/*
45
 
46
  # ── Neo4j Community 2026.04.0 ─────────────────────────────────────────────────
47
- ENV NEO4J_VERSION=2026.04.0
48
  ENV NEO4J_HOME=/opt/neo4j
49
  ENV PATH="${NEO4J_HOME}/bin:${PATH}"
50
 
 
44
  && rm -rf /var/lib/apt/lists/*
45
 
46
  # ── Neo4j Community 2026.04.0 ─────────────────────────────────────────────────
47
+ ENV NEO4J_VERSION=5.26.0
48
  ENV NEO4J_HOME=/opt/neo4j
49
  ENV PATH="${NEO4J_HOME}/bin:${PATH}"
50
 
backend/graphrag.py CHANGED
@@ -57,25 +57,41 @@ _CYPHER_GENERATION_TEMPLATE = """You are an expert Neo4j Cypher query writer for
57
  Schema:
58
  {schema}
59
 
60
- Node property conventions (IMPORTANT β€” use these exact property names and value formats):
61
- - Patient: id (e.g. "P-001"), name, age (integer), sex ("M"/"F"), ethnicity, city, state, ecog_score (integer)
62
- - Trial: id (NCT id), title, condition (lowercase, e.g. "breast cancer"), phase, status, sponsor
63
- - Diagnosis: id, name (e.g. "Breast Cancer"), icd10 (e.g. "C50")
64
- - Biomarker: id (e.g. "HER2_POS", "EGFR_MUT", "BRCA1_MUT", "PD_L1_POS"), name (e.g. "HER2 Positive", "EGFR Mutation")
65
- - Medication: id (e.g. "TAMOXIFEN"), name (e.g. "Tamoxifen")
66
- - StudySite: id, name, city, state, lat, lon, trials (integer), enrolled (integer), capacity (integer)
 
 
 
 
 
 
 
 
 
67
 
68
  Relationships:
69
- - (Patient)-[:ELIGIBLE_FOR {{score: float}}]->(Trial)
70
  - (Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
71
  - (Patient)-[:HAS_BIOMARKER]->(Biomarker)
72
- - (Patient)-[:TAKES_MEDICATION]->(Medication)
73
- - (Trial)-[:LOCATED_AT]->(StudySite)
 
 
 
 
74
 
75
  Rules:
76
- - For biomarker lookups, use the `id` property with uppercase underscore format, e.g. `{{id: 'HER2_POS'}}`
77
- - For condition lookups on Trial nodes, use lowercase: `t.condition = 'breast cancer'`
78
- - Always use relationship pattern (Patient)-[:ELIGIBLE_FOR]->(Trial) to find eligible patients
 
 
 
79
  - Limit results to 25 unless asked for more
80
 
81
  Question: {question}
 
57
  Schema:
58
  {schema}
59
 
60
+ Node labels and their exact property names:
61
+ - Patient: id (e.g. "P_C50_000001"), name, age (integer), sex ("MALE"/"FEMALE"), ecog (integer 0-3),
62
+ condition (lowercase, e.g. "breast cancer"), stage ("I"/"II"/"III"/"IV"),
63
+ city, state, ethnicity, insurance, icd10_prefix,
64
+ biomarkers (list of biomarker ids), medications (list of drug names),
65
+ comorbidities (list), prior_chemo (boolean), prior_radiation (boolean),
66
+ prior_surgery (boolean), prior_lines_of_therapy (integer), source
67
+ - Trial: id (NCT id, e.g. "NCT04567890"), title, condition (lowercase), phase, status,
68
+ brief_summary, eligibility_criteria, min_age, max_age, sex, enrollment,
69
+ start_date, completion_date, sponsor, location_count, source
70
+ - Diagnosis: code (ICD-10, e.g. "C50.919"), name (e.g. "Malignant neoplasm of breast"), source
71
+ - Biomarker: id (e.g. "HER2_POS"), name (e.g. "HER2 Positive"), gene (e.g. "ERBB2"), loinc, source
72
+ - Medication: rxcui, name, tty, generic_name, source
73
+ - StudySite: facility, city, state, country, lat, lon, source
74
+ - ConditionNode: name (e.g. "breast cancer")
75
+ - Publication: pmid, title, journal, pub_date, authors, source
76
 
77
  Relationships:
78
+ - (Patient)-[:ELIGIBLE_FOR {{score: float, matched_at: datetime}}]->(Trial)
79
  - (Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
80
  - (Patient)-[:HAS_BIOMARKER]->(Biomarker)
81
+ - (Trial)-[:CONDUCTED_AT]->(StudySite)
82
+ - (ConditionNode)-[:HAS_TRIAL]->(Trial)
83
+ - (Diagnosis)-[:MAPS_TO_CONDITION]->(ConditionNode)
84
+ - (Biomarker)-[:RELEVANT_TO]->(ConditionNode)
85
+ - (Biomarker)-[:MAY_QUALIFY_FOR]->(Trial)
86
+ - (Publication)-[:SUPPORTS_RESEARCH_ON]->(ConditionNode)
87
 
88
  Rules:
89
+ - Biomarker lookups use the `id` property: `{{id: 'HER2_POS'}}`
90
+ - Diagnosis lookups use `code` (not `id`): `{{code: 'C50.919'}}`
91
+ - Medication lookups use `rxcui` or `name` (not `id`)
92
+ - Condition lookups on Trial nodes use lowercase: `t.condition = 'breast cancer'`
93
+ - Patient-to-trial eligibility: `(p:Patient)-[:ELIGIBLE_FOR]->(t:Trial)`
94
+ - ecog property on Patient is `ecog` (integer), NOT `ecog_score`
95
  - Limit results to 25 unless asked for more
96
 
97
  Question: {question}