# AMR-Guard Knowledge Storage Strategy ## Overview This document defines how each document in the `docs/` folder will be stored and queried to support the **AMR-Guard: Infection Lifecycle Orchestrator** workflow. --- ## Document Classification Summary | Document | Type | Storage | Purpose in Workflow | |----------|------|---------|---------------------| | EML exports (ACCESS/RESERVE/WATCH) | XLSX | **SQLite** | Antibiotic classification & stewardship | | ATLAS Susceptibility Data | XLSX | **SQLite** | Pathogen resistance patterns | | MIC Breakpoint Tables | XLSX | **SQLite** | Susceptibility interpretation | | Drug Interactions | CSV | **SQLite** | Drug safety screening | | IDSA Guidance (ciae403.pdf) | PDF | **ChromaDB** | Clinical treatment guidelines | | MIC Breakpoint Tables (PDF) | PDF | **ChromaDB** | Reference documentation | --- ## Part 1: Structured Data (SQLite) ### 1.1 EML Antibiotic Classification Tables **Source Files:** - `antibiotic_guidelines/EML export ACCESS group.xlsx` - `antibiotic_guidelines/EML export RESERVE group.xlsx` - `antibiotic_guidelines/EML export WATCH group.xlsx` **Database Table: `eml_antibiotics`** ```sql CREATE TABLE eml_antibiotics ( id INTEGER PRIMARY KEY AUTOINCREMENT, medicine_name TEXT NOT NULL, who_category TEXT NOT NULL, -- 'ACCESS', 'RESERVE', 'WATCH' eml_section TEXT, formulations TEXT, indication TEXT, atc_codes TEXT, combined_with TEXT, status TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_medicine_name ON eml_antibiotics(medicine_name); CREATE INDEX idx_who_category ON eml_antibiotics(who_category); CREATE INDEX idx_atc_codes ON eml_antibiotics(atc_codes); ``` **Usage in Workflow:** - **Agent 1 (Intake Historian):** Query to identify antibiotic stewardship category - **Agent 4 (Clinical Pharmacologist):** Suggest ACCESS antibiotics first, escalate to WATCH/RESERVE only when necessary --- ### 1.2 ATLAS Pathogen Susceptibility Data **Source File:** `pathogen_resistance/ATLAS Susceptibility Data Export.xlsx` **Database Tables:** ```sql CREATE TABLE atlas_susceptibility_percent ( id INTEGER PRIMARY KEY AUTOINCREMENT, pathogen TEXT NOT NULL, antibiotic TEXT NOT NULL, region TEXT, year INTEGER, susceptibility_percent REAL, sample_size INTEGER, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE atlas_susceptibility_absolute ( id INTEGER PRIMARY KEY AUTOINCREMENT, pathogen TEXT NOT NULL, antibiotic TEXT NOT NULL, region TEXT, year INTEGER, susceptible_count INTEGER, intermediate_count INTEGER, resistant_count INTEGER, total_isolates INTEGER, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_pathogen ON atlas_susceptibility_percent(pathogen); CREATE INDEX idx_antibiotic ON atlas_susceptibility_percent(antibiotic); CREATE INDEX idx_pathogen_abs ON atlas_susceptibility_absolute(pathogen); ``` **Usage in Workflow:** - **Agent 1 (Empirical Phase):** Retrieve local/regional resistance patterns for empirical therapy - **Agent 3 (Trend Analyst):** Compare current MIC with population-level trends --- ### 1.3 MIC Breakpoint Tables **Source File:** `mic_breakpoints/v_16.0__BreakpointTables.xlsx` **Database Tables:** ```sql CREATE TABLE mic_breakpoints ( id INTEGER PRIMARY KEY AUTOINCREMENT, pathogen_group TEXT NOT NULL, -- e.g., 'Enterobacterales', 'Staphylococcus' antibiotic TEXT NOT NULL, route TEXT, -- 'IV', 'Oral', 'Topical' mic_susceptible REAL, -- S breakpoint (mg/L) mic_resistant REAL, -- R breakpoint (mg/L) disk_susceptible REAL, -- Zone diameter (mm) disk_resistant REAL, notes TEXT, eucast_version TEXT DEFAULT '16.0', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE dosage_guidance ( id INTEGER PRIMARY KEY AUTOINCREMENT, antibiotic TEXT NOT NULL, standard_dose TEXT, high_dose TEXT, renal_adjustment TEXT, notes TEXT ); CREATE INDEX idx_bp_pathogen ON mic_breakpoints(pathogen_group); CREATE INDEX idx_bp_antibiotic ON mic_breakpoints(antibiotic); ``` **Usage in Workflow:** - **Agent 2 (Vision Specialist):** Validate extracted MIC values against breakpoints - **Agent 3 (Trend Analyst):** Interpret S/I/R classification from MIC values - **Agent 4 (Clinical Pharmacologist):** Use dosage guidance for prescriptions --- ### 1.4 Drug Interactions Database **Source File:** `drug_safety/db_drug_interactions.csv` **Database Table:** ```sql CREATE TABLE drug_interactions ( id INTEGER PRIMARY KEY AUTOINCREMENT, drug_1 TEXT NOT NULL, drug_2 TEXT NOT NULL, interaction_description TEXT, severity TEXT, -- Derived: 'major', 'moderate', 'minor' mechanism TEXT, -- Derived from description created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_drug_1 ON drug_interactions(drug_1); CREATE INDEX idx_drug_2 ON drug_interactions(drug_2); CREATE INDEX idx_severity ON drug_interactions(severity); -- View for bidirectional lookup CREATE VIEW drug_interaction_lookup AS SELECT drug_1, drug_2, interaction_description, severity FROM drug_interactions UNION ALL SELECT drug_2, drug_1, interaction_description, severity FROM drug_interactions; ``` **Usage in Workflow:** - **Agent 4 (Clinical Pharmacologist):** Check for interactions with patient's current medications - **Safety Alerts:** Flag potential toxicity issues --- ## Part 2: Unstructured Data (ChromaDB) ### 2.1 IDSA Clinical Guidelines **Source File:** `antibiotic_guidelines/ciae403.pdf` **ChromaDB Collection: `idsa_treatment_guidelines`** ```python collection_config = { "name": "idsa_treatment_guidelines", "metadata": { "source": "IDSA 2024 Guidance", "doi": "10.1093/cid/ciae403", "version": "2024" }, "embedding_function": "sentence-transformers/all-MiniLM-L6-v2" } # Document chunking strategy chunk_config = { "chunk_size": 1000, "chunk_overlap": 200, "separators": ["\n\n", "\n", ". "], "metadata_fields": ["section", "pathogen_type", "recommendation_type"] } ``` **Metadata Schema per Chunk:** ```python { "section": "Treatment Recommendations", "pathogen_type": "ESBL-E | CRE | CRAB | DTR-PA | S.maltophilia", "recommendation_strength": "Strong | Conditional", "evidence_quality": "High | Moderate | Low", "page_number": int } ``` **Usage in Workflow:** - **Agent 1 (Empirical Phase):** Retrieve treatment recommendations for suspected pathogens - **Agent 4 (Clinical Pharmacologist):** Provide evidence-based justification for antibiotic selection --- ### 2.2 MIC Breakpoint Reference (PDF) **Source File:** `mic_breakpoints/v_16.0_Breakpoint_Tables.pdf` **ChromaDB Collection: `mic_reference_docs`** ```python collection_config = { "name": "mic_reference_docs", "metadata": { "source": "EUCAST Breakpoint Tables", "version": "16.0" }, "embedding_function": "sentence-transformers/all-MiniLM-L6-v2" } ``` **Usage in Workflow:** - **Supplementary Context:** Provide detailed explanations for breakpoint interpretations - **Edge Cases:** Handle unusual pathogens or antibiotic combinations not in structured tables --- ## Part 3: Query Tools Definition ### Tool 1: `query_antibiotic_info` **Purpose:** Retrieve antibiotic classification and formulation details ```python def query_antibiotic_info( antibiotic_name: str, include_category: bool = True, include_formulations: bool = True ) -> dict: """ Query EML antibiotic database for classification and details. Args: antibiotic_name: Name of the antibiotic (partial match supported) include_category: Include WHO stewardship category include_formulations: Include available formulations Returns: dict with antibiotic details, category, indications Used by: Agent 1, Agent 4 """ ``` **SQL Query:** ```sql SELECT medicine_name, who_category, formulations, indication, combined_with FROM eml_antibiotics WHERE LOWER(medicine_name) LIKE LOWER(?) ORDER BY who_category; -- ACCESS first, then WATCH, then RESERVE ``` --- ### Tool 2: `query_resistance_pattern` **Purpose:** Get susceptibility data for pathogen-antibiotic combinations ```python def query_resistance_pattern( pathogen: str, antibiotic: str = None, region: str = None, year: int = None ) -> dict: """ Query ATLAS susceptibility data for resistance patterns. Args: pathogen: Pathogen name (e.g., "E. coli", "K. pneumoniae") antibiotic: Optional specific antibiotic to check region: Optional geographic region filter year: Optional year filter (defaults to most recent) Returns: dict with susceptibility percentages and trends Used by: Agent 1 (Empirical), Agent 3 (Trend Analysis) """ ``` **SQL Query:** ```sql SELECT antibiotic, susceptibility_percent, sample_size, year FROM atlas_susceptibility_percent WHERE LOWER(pathogen) LIKE LOWER(?) AND (antibiotic = ? OR ? IS NULL) AND (region = ? OR ? IS NULL) ORDER BY year DESC, susceptibility_percent DESC; ``` --- ### Tool 3: `interpret_mic_value` **Purpose:** Classify MIC as S/I/R based on EUCAST breakpoints ```python def interpret_mic_value( pathogen: str, antibiotic: str, mic_value: float, route: str = "IV" ) -> dict: """ Interpret MIC value against EUCAST breakpoints. Args: pathogen: Pathogen name or group antibiotic: Antibiotic name mic_value: MIC value in mg/L route: Administration route (IV, Oral) Returns: dict with interpretation (S/I/R), breakpoint values, dosing notes Used by: Agent 2, Agent 3 """ ``` **SQL Query:** ```sql SELECT mic_susceptible, mic_resistant, notes FROM mic_breakpoints WHERE LOWER(pathogen_group) LIKE LOWER(?) AND LOWER(antibiotic) LIKE LOWER(?) AND (route = ? OR route IS NULL); ``` **Interpretation Logic:** ```python if mic_value <= mic_susceptible: return "Susceptible" elif mic_value > mic_resistant: return "Resistant" else: return "Intermediate (Susceptible, Increased Exposure)" ``` --- ### Tool 4: `check_drug_interactions` **Purpose:** Screen for drug-drug interactions ```python def check_drug_interactions( target_drug: str, patient_medications: list[str], severity_filter: str = None ) -> list[dict]: """ Check for interactions between target drug and patient's medications. Args: target_drug: Antibiotic being considered patient_medications: List of patient's current medications severity_filter: Optional filter ('major', 'moderate', 'minor') Returns: list of interaction dicts with severity and description Used by: Agent 4 (Safety Check) """ ``` **SQL Query:** ```sql SELECT drug_1, drug_2, interaction_description, severity FROM drug_interaction_lookup WHERE LOWER(drug_1) LIKE LOWER(?) AND LOWER(drug_2) IN (SELECT LOWER(value) FROM json_each(?)) AND (severity = ? OR ? IS NULL) ORDER BY severity DESC; ``` --- ### Tool 5: `search_clinical_guidelines` **Purpose:** RAG search over IDSA guidelines for treatment recommendations ```python def search_clinical_guidelines( query: str, pathogen_filter: str = None, n_results: int = 5 ) -> list[dict]: """ Semantic search over IDSA clinical guidelines. Args: query: Natural language query about treatment pathogen_filter: Optional pathogen type filter n_results: Number of results to return Returns: list of relevant guideline excerpts with metadata Used by: Agent 1 (Empirical), Agent 4 (Justification) """ ``` **ChromaDB Query:** ```python results = collection.query( query_texts=[query], n_results=n_results, where={"pathogen_type": pathogen_filter} if pathogen_filter else None, include=["documents", "metadatas", "distances"] ) ``` --- ### Tool 6: `calculate_mic_trend` **Purpose:** Analyze MIC creep over time ```python def calculate_mic_trend( patient_id: str, pathogen: str, antibiotic: str, historical_mics: list[dict] # [{date, mic_value}, ...] ) -> dict: """ Calculate resistance velocity and MIC trend. Args: patient_id: Patient identifier pathogen: Identified pathogen antibiotic: Target antibiotic historical_mics: List of historical MIC readings Returns: dict with trend analysis, resistance_velocity, risk_level Used by: Agent 3 (Trend Analyst) """ ``` **Logic:** ```python # Calculate resistance velocity if len(historical_mics) >= 2: baseline_mic = historical_mics[0]["mic_value"] current_mic = historical_mics[-1]["mic_value"] ratio = current_mic / baseline_mic if ratio >= 4: # Two-step dilution increase risk_level = "HIGH" alert = "MIC Creep Detected - Risk of Treatment Failure" elif ratio >= 2: risk_level = "MODERATE" alert = "MIC Trending Upward - Monitor Closely" else: risk_level = "LOW" alert = None ``` --- ## Part 4: Workflow Integration ### Stage 1: Empirical Phase (Before Lab Results) ``` Input: Patient history, symptoms, infection site │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Agent 1: Intake Historian (MedGemma 1.5) │ │ ├── Tool: search_clinical_guidelines() │ │ │ └── ChromaDB: idsa_treatment_guidelines │ │ ├── Tool: query_resistance_pattern() │ │ │ └── SQLite: atlas_susceptibility_percent │ │ └── Tool: query_antibiotic_info() │ │ └── SQLite: eml_antibiotics │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Agent 4: Clinical Pharmacologist (TxGemma) │ │ ├── Tool: check_drug_interactions() │ │ │ └── SQLite: drug_interactions │ │ └── Tool: query_antibiotic_info() [dosing] │ │ └── SQLite: eml_antibiotics + dosage_guidance │ └─────────────────────────────────────────────────────────┘ │ ▼ Output: Empirical therapy recommendation with safety check ``` ### Stage 2: Targeted Phase (After Lab Results) ``` Input: Lab report (antibiogram image/PDF) │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Agent 2: Vision Specialist (MedGemma 4B) │ │ ├── Extract: Pathogen name, MIC values │ │ └── Tool: interpret_mic_value() │ │ └── SQLite: mic_breakpoints │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Agent 3: Trend Analyst (MedGemma 27B) │ │ ├── Tool: calculate_mic_trend() │ │ │ └── Patient historical data + current MIC │ │ └── Tool: query_resistance_pattern() │ │ └── SQLite: atlas_susceptibility (population data) │ └─────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Agent 4: Clinical Pharmacologist (TxGemma) │ │ ├── Tool: search_clinical_guidelines() │ │ │ └── ChromaDB: idsa_treatment_guidelines │ │ ├── Tool: check_drug_interactions() │ │ │ └── SQLite: drug_interactions │ │ └── Generate: Final prescription with justification │ └─────────────────────────────────────────────────────────┘ │ ▼ Output: Targeted therapy with MIC trend analysis & safety alerts ``` --- ## Part 5: Implementation Checklist ### SQLite Setup - [ ] Create database schema with all tables - [ ] Import EML Excel files (ACCESS, RESERVE, WATCH) - [ ] Import ATLAS susceptibility data (both sheets) - [ ] Import MIC breakpoint tables (41 sheets) - [ ] Import drug interactions CSV - [ ] Add severity classification to interactions - [ ] Create indexes for efficient queries ### ChromaDB Setup - [ ] Initialize ChromaDB persistent storage - [ ] Process ciae403.pdf with chunking strategy - [ ] Process MIC breakpoint PDF - [ ] Add metadata to all chunks - [ ] Test semantic search queries ### Tool Implementation - [ ] Implement `query_antibiotic_info()` - [ ] Implement `query_resistance_pattern()` - [ ] Implement `interpret_mic_value()` - [ ] Implement `check_drug_interactions()` - [ ] Implement `search_clinical_guidelines()` - [ ] Implement `calculate_mic_trend()` - [ ] Create unified tool interface for LangGraph --- ## File Structure ``` AMR-Guard/ ├── docs/ # Source documents ├── data/ │ ├── medic.db # SQLite database │ └── chroma/ # ChromaDB persistent storage ├── src/ │ ├── db/ │ │ ├── schema.sql # Database schema │ │ └── import_data.py # Data import scripts │ ├── tools/ │ │ ├── antibiotic_tools.py # query_antibiotic_info, interpret_mic │ │ ├── resistance_tools.py # query_resistance_pattern, calculate_mic_trend │ │ ├── safety_tools.py # check_drug_interactions │ │ └── rag_tools.py # search_clinical_guidelines │ └── agents/ │ ├── intake_historian.py # Agent 1 │ ├── vision_specialist.py # Agent 2 │ ├── trend_analyst.py # Agent 3 │ └── clinical_pharmacologist.py # Agent 4 └── KNOWLEDGE_STORAGE_STRATEGY.md # This document ```