Spaces:

ghitaben
/

AMR-Guard

Running on Zero

App Files Files Community

AMR-Guard / docs /KNOWLEDGE_STORAGE_STRATEGY.md

ghitaben

Med-I-C -> AMR-Guard

85020ae 23 days ago

preview code

raw

history blame contribute delete

19.3 kB

	# AMR-Guard Knowledge Storage Strategy

	## Overview

	This document defines how each document in the `docs/` folder will be stored and queried to support the AMR-Guard: Infection Lifecycle Orchestrator workflow.

	---

	## Document Classification Summary

	\| Document \| Type \| Storage \| Purpose in Workflow \|
	\|----------\|------\|---------\|---------------------\|
	\| EML exports (ACCESS/RESERVE/WATCH) \| XLSX \| SQLite \| Antibiotic classification & stewardship \|
	\| ATLAS Susceptibility Data \| XLSX \| SQLite \| Pathogen resistance patterns \|
	\| MIC Breakpoint Tables \| XLSX \| SQLite \| Susceptibility interpretation \|
	\| Drug Interactions \| CSV \| SQLite \| Drug safety screening \|
	\| IDSA Guidance (ciae403.pdf) \| PDF \| ChromaDB \| Clinical treatment guidelines \|
	\| MIC Breakpoint Tables (PDF) \| PDF \| ChromaDB \| Reference documentation \|

	---

	## Part 1: Structured Data (SQLite)

	### 1.1 EML Antibiotic Classification Tables

	Source Files:
	- `antibiotic_guidelines/EML export ACCESS group.xlsx`
	- `antibiotic_guidelines/EML export RESERVE group.xlsx`
	- `antibiotic_guidelines/EML export WATCH group.xlsx`

	Database Table: `eml_antibiotics`

	```sql
	CREATE TABLE eml_antibiotics (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	medicine_name TEXT NOT NULL,
	who_category TEXT NOT NULL, -- 'ACCESS', 'RESERVE', 'WATCH'
	eml_section TEXT,
	formulations TEXT,
	indication TEXT,
	atc_codes TEXT,
	combined_with TEXT,
	status TEXT,
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
	);

	CREATE INDEX idx_medicine_name ON eml_antibiotics(medicine_name);
	CREATE INDEX idx_who_category ON eml_antibiotics(who_category);
	CREATE INDEX idx_atc_codes ON eml_antibiotics(atc_codes);
	```

	Usage in Workflow:
	- Agent 1 (Intake Historian): Query to identify antibiotic stewardship category
	- Agent 4 (Clinical Pharmacologist): Suggest ACCESS antibiotics first, escalate to WATCH/RESERVE only when necessary

	---

	### 1.2 ATLAS Pathogen Susceptibility Data

	Source File: `pathogen_resistance/ATLAS Susceptibility Data Export.xlsx`

	Database Tables:

	```sql
	CREATE TABLE atlas_susceptibility_percent (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	pathogen TEXT NOT NULL,
	antibiotic TEXT NOT NULL,
	region TEXT,
	year INTEGER,
	susceptibility_percent REAL,
	sample_size INTEGER,
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
	);

	CREATE TABLE atlas_susceptibility_absolute (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	pathogen TEXT NOT NULL,
	antibiotic TEXT NOT NULL,
	region TEXT,
	year INTEGER,
	susceptible_count INTEGER,
	intermediate_count INTEGER,
	resistant_count INTEGER,
	total_isolates INTEGER,
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
	);

	CREATE INDEX idx_pathogen ON atlas_susceptibility_percent(pathogen);
	CREATE INDEX idx_antibiotic ON atlas_susceptibility_percent(antibiotic);
	CREATE INDEX idx_pathogen_abs ON atlas_susceptibility_absolute(pathogen);
	```

	Usage in Workflow:
	- Agent 1 (Empirical Phase): Retrieve local/regional resistance patterns for empirical therapy
	- Agent 3 (Trend Analyst): Compare current MIC with population-level trends

	---

	### 1.3 MIC Breakpoint Tables

	Source File: `mic_breakpoints/v_16.0__BreakpointTables.xlsx`

	Database Tables:

	```sql
	CREATE TABLE mic_breakpoints (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	pathogen_group TEXT NOT NULL, -- e.g., 'Enterobacterales', 'Staphylococcus'
	antibiotic TEXT NOT NULL,
	route TEXT, -- 'IV', 'Oral', 'Topical'
	mic_susceptible REAL, -- S breakpoint (mg/L)
	mic_resistant REAL, -- R breakpoint (mg/L)
	disk_susceptible REAL, -- Zone diameter (mm)
	disk_resistant REAL,
	notes TEXT,
	eucast_version TEXT DEFAULT '16.0',
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
	);

	CREATE TABLE dosage_guidance (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	antibiotic TEXT NOT NULL,
	standard_dose TEXT,
	high_dose TEXT,
	renal_adjustment TEXT,
	notes TEXT
	);

	CREATE INDEX idx_bp_pathogen ON mic_breakpoints(pathogen_group);
	CREATE INDEX idx_bp_antibiotic ON mic_breakpoints(antibiotic);
	```

	Usage in Workflow:
	- Agent 2 (Vision Specialist): Validate extracted MIC values against breakpoints
	- Agent 3 (Trend Analyst): Interpret S/I/R classification from MIC values
	- Agent 4 (Clinical Pharmacologist): Use dosage guidance for prescriptions

	---

	### 1.4 Drug Interactions Database

	Source File: `drug_safety/db_drug_interactions.csv`

	Database Table:

	```sql
	CREATE TABLE drug_interactions (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	drug_1 TEXT NOT NULL,
	drug_2 TEXT NOT NULL,
	interaction_description TEXT,
	severity TEXT, -- Derived: 'major', 'moderate', 'minor'
	mechanism TEXT, -- Derived from description
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
	);

	CREATE INDEX idx_drug_1 ON drug_interactions(drug_1);
	CREATE INDEX idx_drug_2 ON drug_interactions(drug_2);
	CREATE INDEX idx_severity ON drug_interactions(severity);

	-- View for bidirectional lookup
	CREATE VIEW drug_interaction_lookup AS
	SELECT drug_1, drug_2, interaction_description, severity FROM drug_interactions
	UNION ALL
	SELECT drug_2, drug_1, interaction_description, severity FROM drug_interactions;
	```

	Usage in Workflow:
	- Agent 4 (Clinical Pharmacologist): Check for interactions with patient's current medications
	- Safety Alerts: Flag potential toxicity issues

	---

	## Part 2: Unstructured Data (ChromaDB)

	### 2.1 IDSA Clinical Guidelines

	Source File: `antibiotic_guidelines/ciae403.pdf`

	ChromaDB Collection: `idsa_treatment_guidelines`

	```python
	collection_config = {
	"name": "idsa_treatment_guidelines",
	"metadata": {
	"source": "IDSA 2024 Guidance",
	"doi": "10.1093/cid/ciae403",
	"version": "2024"
	},
	"embedding_function": "sentence-transformers/all-MiniLM-L6-v2"
	}

	# Document chunking strategy
	chunk_config = {
	"chunk_size": 1000,
	"chunk_overlap": 200,
	"separators": ["\n\n", "\n", ". "],
	"metadata_fields": ["section", "pathogen_type", "recommendation_type"]
	}
	```

	Metadata Schema per Chunk:
	```python
	{
	"section": "Treatment Recommendations",
	"pathogen_type": "ESBL-E \| CRE \| CRAB \| DTR-PA \| S.maltophilia",
	"recommendation_strength": "Strong \| Conditional",
	"evidence_quality": "High \| Moderate \| Low",
	"page_number": int
	}
	```

	Usage in Workflow:
	- Agent 1 (Empirical Phase): Retrieve treatment recommendations for suspected pathogens
	- Agent 4 (Clinical Pharmacologist): Provide evidence-based justification for antibiotic selection

	---

	### 2.2 MIC Breakpoint Reference (PDF)

	Source File: `mic_breakpoints/v_16.0_Breakpoint_Tables.pdf`

	ChromaDB Collection: `mic_reference_docs`

	```python
	collection_config = {
	"name": "mic_reference_docs",
	"metadata": {
	"source": "EUCAST Breakpoint Tables",
	"version": "16.0"
	},
	"embedding_function": "sentence-transformers/all-MiniLM-L6-v2"
	}
	```

	Usage in Workflow:
	- Supplementary Context: Provide detailed explanations for breakpoint interpretations
	- Edge Cases: Handle unusual pathogens or antibiotic combinations not in structured tables

	---

	## Part 3: Query Tools Definition

	### Tool 1: `query_antibiotic_info`

	Purpose: Retrieve antibiotic classification and formulation details

	```python
	def query_antibiotic_info(
	antibiotic_name: str,
	include_category: bool = True,
	include_formulations: bool = True
	) -> dict:
	"""
	Query EML antibiotic database for classification and details.

	Args:
	antibiotic_name: Name of the antibiotic (partial match supported)
	include_category: Include WHO stewardship category
	include_formulations: Include available formulations

	Returns:
	dict with antibiotic details, category, indications

	Used by: Agent 1, Agent 4
	"""
	```

	SQL Query:
	```sql
	SELECT medicine_name, who_category, formulations, indication, combined_with
	FROM eml_antibiotics
	WHERE LOWER(medicine_name) LIKE LOWER(?)
	ORDER BY who_category; -- ACCESS first, then WATCH, then RESERVE
	```

	---

	### Tool 2: `query_resistance_pattern`

	Purpose: Get susceptibility data for pathogen-antibiotic combinations

	```python
	def query_resistance_pattern(
	pathogen: str,
	antibiotic: str = None,
	region: str = None,
	year: int = None
	) -> dict:
	"""
	Query ATLAS susceptibility data for resistance patterns.

	Args:
	pathogen: Pathogen name (e.g., "E. coli", "K. pneumoniae")
	antibiotic: Optional specific antibiotic to check
	region: Optional geographic region filter
	year: Optional year filter (defaults to most recent)

	Returns:
	dict with susceptibility percentages and trends

	Used by: Agent 1 (Empirical), Agent 3 (Trend Analysis)
	"""
	```

	SQL Query:
	```sql
	SELECT antibiotic, susceptibility_percent, sample_size, year
	FROM atlas_susceptibility_percent
	WHERE LOWER(pathogen) LIKE LOWER(?)
	AND (antibiotic = ? OR ? IS NULL)
	AND (region = ? OR ? IS NULL)
	ORDER BY year DESC, susceptibility_percent DESC;
	```

	---

	### Tool 3: `interpret_mic_value`

	Purpose: Classify MIC as S/I/R based on EUCAST breakpoints

	```python
	def interpret_mic_value(
	pathogen: str,
	antibiotic: str,
	mic_value: float,
	route: str = "IV"
	) -> dict:
	"""
	Interpret MIC value against EUCAST breakpoints.

	Args:
	pathogen: Pathogen name or group
	antibiotic: Antibiotic name
	mic_value: MIC value in mg/L
	route: Administration route (IV, Oral)

	Returns:
	dict with interpretation (S/I/R), breakpoint values, dosing notes

	Used by: Agent 2, Agent 3
	"""
	```

	SQL Query:
	```sql
	SELECT mic_susceptible, mic_resistant, notes
	FROM mic_breakpoints
	WHERE LOWER(pathogen_group) LIKE LOWER(?)
	AND LOWER(antibiotic) LIKE LOWER(?)
	AND (route = ? OR route IS NULL);
	```

	Interpretation Logic:
	```python
	if mic_value <= mic_susceptible:
	return "Susceptible"
	elif mic_value > mic_resistant:
	return "Resistant"
	else:
	return "Intermediate (Susceptible, Increased Exposure)"
	```

	---

	### Tool 4: `check_drug_interactions`

	Purpose: Screen for drug-drug interactions

	```python
	def check_drug_interactions(
	target_drug: str,
	patient_medications: list[str],
	severity_filter: str = None
	) -> list[dict]:
	"""
	Check for interactions between target drug and patient's medications.

	Args:
	target_drug: Antibiotic being considered
	patient_medications: List of patient's current medications
	severity_filter: Optional filter ('major', 'moderate', 'minor')

	Returns:
	list of interaction dicts with severity and description

	Used by: Agent 4 (Safety Check)
	"""
	```

	SQL Query:
	```sql
	SELECT drug_1, drug_2, interaction_description, severity
	FROM drug_interaction_lookup
	WHERE LOWER(drug_1) LIKE LOWER(?)
	AND LOWER(drug_2) IN (SELECT LOWER(value) FROM json_each(?))
	AND (severity = ? OR ? IS NULL)
	ORDER BY severity DESC;
	```

	---

	### Tool 5: `search_clinical_guidelines`

	Purpose: RAG search over IDSA guidelines for treatment recommendations

	```python
	def search_clinical_guidelines(
	query: str,
	pathogen_filter: str = None,
	n_results: int = 5
	) -> list[dict]:
	"""
	Semantic search over IDSA clinical guidelines.

	Args:
	query: Natural language query about treatment
	pathogen_filter: Optional pathogen type filter
	n_results: Number of results to return

	Returns:
	list of relevant guideline excerpts with metadata

	Used by: Agent 1 (Empirical), Agent 4 (Justification)
	"""
	```

	ChromaDB Query:
	```python
	results = collection.query(
	query_texts=[query],
	n_results=n_results,
	where={"pathogen_type": pathogen_filter} if pathogen_filter else None,
	include=["documents", "metadatas", "distances"]
	)
	```

	---

	### Tool 6: `calculate_mic_trend`

	Purpose: Analyze MIC creep over time

	```python
	def calculate_mic_trend(
	patient_id: str,
	pathogen: str,
	antibiotic: str,
	historical_mics: list[dict] # [{date, mic_value}, ...]
	) -> dict:
	"""
	Calculate resistance velocity and MIC trend.

	Args:
	patient_id: Patient identifier
	pathogen: Identified pathogen
	antibiotic: Target antibiotic
	historical_mics: List of historical MIC readings

	Returns:
	dict with trend analysis, resistance_velocity, risk_level

	Used by: Agent 3 (Trend Analyst)
	"""
	```

	Logic:
	```python
	# Calculate resistance velocity
	if len(historical_mics) >= 2:
	baseline_mic = historical_mics[0]["mic_value"]
	current_mic = historical_mics[-1]["mic_value"]

	ratio = current_mic / baseline_mic

	if ratio >= 4: # Two-step dilution increase
	risk_level = "HIGH"
	alert = "MIC Creep Detected - Risk of Treatment Failure"
	elif ratio >= 2:
	risk_level = "MODERATE"
	alert = "MIC Trending Upward - Monitor Closely"
	else:
	risk_level = "LOW"
	alert = None
	```

	---

	## Part 4: Workflow Integration

	### Stage 1: Empirical Phase (Before Lab Results)

	```
	Input: Patient history, symptoms, infection site
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Agent 1: Intake Historian (MedGemma 1.5) │
	│ ├── Tool: search_clinical_guidelines() │
	│ │ └── ChromaDB: idsa_treatment_guidelines │
	│ ├── Tool: query_resistance_pattern() │
	│ │ └── SQLite: atlas_susceptibility_percent │
	│ └── Tool: query_antibiotic_info() │
	│ └── SQLite: eml_antibiotics │
	└─────────────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Agent 4: Clinical Pharmacologist (TxGemma) │
	│ ├── Tool: check_drug_interactions() │
	│ │ └── SQLite: drug_interactions │
	│ └── Tool: query_antibiotic_info() [dosing] │
	│ └── SQLite: eml_antibiotics + dosage_guidance │
	└─────────────────────────────────────────────────────────┘
	│
	▼
	Output: Empirical therapy recommendation with safety check
	```

	### Stage 2: Targeted Phase (After Lab Results)

	```
	Input: Lab report (antibiogram image/PDF)
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Agent 2: Vision Specialist (MedGemma 4B) │
	│ ├── Extract: Pathogen name, MIC values │
	│ └── Tool: interpret_mic_value() │
	│ └── SQLite: mic_breakpoints │
	└─────────────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Agent 3: Trend Analyst (MedGemma 27B) │
	│ ├── Tool: calculate_mic_trend() │
	│ │ └── Patient historical data + current MIC │
	│ └── Tool: query_resistance_pattern() │
	│ └── SQLite: atlas_susceptibility (population data) │
	└─────────────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Agent 4: Clinical Pharmacologist (TxGemma) │
	│ ├── Tool: search_clinical_guidelines() │
	│ │ └── ChromaDB: idsa_treatment_guidelines │
	│ ├── Tool: check_drug_interactions() │
	│ │ └── SQLite: drug_interactions │
	│ └── Generate: Final prescription with justification │
	└─────────────────────────────────────────────────────────┘
	│
	▼
	Output: Targeted therapy with MIC trend analysis & safety alerts
	```

	---

	## Part 5: Implementation Checklist

	### SQLite Setup
	- [ ] Create database schema with all tables
	- [ ] Import EML Excel files (ACCESS, RESERVE, WATCH)
	- [ ] Import ATLAS susceptibility data (both sheets)
	- [ ] Import MIC breakpoint tables (41 sheets)
	- [ ] Import drug interactions CSV
	- [ ] Add severity classification to interactions
	- [ ] Create indexes for efficient queries

	### ChromaDB Setup
	- [ ] Initialize ChromaDB persistent storage
	- [ ] Process ciae403.pdf with chunking strategy
	- [ ] Process MIC breakpoint PDF
	- [ ] Add metadata to all chunks
	- [ ] Test semantic search queries

	### Tool Implementation
	- [ ] Implement `query_antibiotic_info()`
	- [ ] Implement `query_resistance_pattern()`
	- [ ] Implement `interpret_mic_value()`
	- [ ] Implement `check_drug_interactions()`
	- [ ] Implement `search_clinical_guidelines()`
	- [ ] Implement `calculate_mic_trend()`
	- [ ] Create unified tool interface for LangGraph

	---

	## File Structure

	```
	AMR-Guard/
	├── docs/ # Source documents
	├── data/
	│ ├── medic.db # SQLite database
	│ └── chroma/ # ChromaDB persistent storage
	├── src/
	│ ├── db/
	│ │ ├── schema.sql # Database schema
	│ │ └── import_data.py # Data import scripts
	│ ├── tools/
	│ │ ├── antibiotic_tools.py # query_antibiotic_info, interpret_mic
	│ │ ├── resistance_tools.py # query_resistance_pattern, calculate_mic_trend
	│ │ ├── safety_tools.py # check_drug_interactions
	│ │ └── rag_tools.py # search_clinical_guidelines
	│ └── agents/
	│ ├── intake_historian.py # Agent 1
	│ ├── vision_specialist.py # Agent 2
	│ ├── trend_analyst.py # Agent 3
	│ └── clinical_pharmacologist.py # Agent 4
	└── KNOWLEDGE_STORAGE_STRATEGY.md # This document
	```