diff --git "a/code.ipynb" "b/code.ipynb" deleted file mode 100644--- "a/code.ipynb" +++ /dev/null @@ -1,2037 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# The AI Clinical Trials Architect: A Self-Evolving Agent Guild for Multi-Objective RAG Optimization\n", - "\n", - "## Part 0: The Grand Challenge: Automating Clinical Trial Design\n", - "\n", - "### 0.1. The New Frontier: From Answering Questions to Generating Protocols\n", - "\n", - "Welcome to a deep dive into the next generation of agentic systems. In previous explorations, we've seen AI systems that can answer questions or summarize documents. Today, we elevate our ambitions significantly. Our goal is to build an AI system that can tackle a core, high-value task in the biopharmaceutical industry: **drafting the \"Patient Inclusion/Exclusion Criteria\" for a new clinical trial.**\n", - "\n", - "This task is orders of magnitude more complex than simple RAG. It involves synthesizing information from disparate sources (medical literature, regulatory law, population data), understanding complex trade-offs, and generating a structured, compliant, and feasible document. It is a perfect testbed for a truly intelligent, multi-agent system.\n", - "\n", - "### 0.2. The Multi-Objective Optimization Problem\n", - "\n", - "A successful clinical trial isn't just scientifically sound; it's a delicate balance of competing priorities. Our AI system must learn to navigate this complex, multi-dimensional trade-off space. We will not be optimizing for a single score, but for a **Pareto Frontier** across five pillars:\n", - "\n", - "1. **Scientific Rigor:** How well do the criteria isolate the target patient population based on the latest medical research?\n", - "2. **Regulatory Compliance:** Do the criteria adhere to FDA guidelines and established legal precedents?\n", - "3. **Ethical Soundness:** Are vulnerable populations appropriately protected? Is the trial design fair and just?\n", - "4. **Recruitment Feasibility:** Can a hospital realistically find enough patients who meet these criteria based on real-world population data?\n", - "5. **Operational Simplicity (Cost Proxy):** Are the screening procedures required by the criteria simple and inexpensive, or do they require complex, costly tests?\n", - "\n", - "### 0.3. The Architectural Vision: A Hierarchical Agent-of-Agents System\n", - "\n", - "To solve this, we will build a hierarchical system:\n", - "\n", - "- **The Inner Loop (The \"Trial Design Guild\"):** A collaborative team of specialist agents built with LangGraph. This guild is our advanced RAG pipeline. It takes a trial concept and drafts a single criteria document.\n", - "- **The Outer Loop (The \"AI Research Director\"):** A supervisory agent system, also built with LangGraph. It acts as the manager, observing the Guild's performance across all five objectives. It then intelligently evolves the Guild's internal \"Standard Operating Procedures\" (SOPs) — its configurations, tools, and prompts — to continuously improve its performance.\n", - "\n", - "### 0.4. Visual Blueprint\n", - "\n", - "\n", - "\n", - "This notebook will walk you through building this entire system from the ground up, one function at a time. Let's begin." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Part 1: Assembling the Knowledge Arsenal: Data, Models, and Tools\n", - "\n", - "First, we need to set up our environment. This includes installing all necessary libraries, configuring our local LLM server, and preparing our diverse data sources. Unlike a simple demonstration, we will be downloading and processing real-world data to ground our system in reality." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.1. Installing the Open-Source Stack\n", - "\n", - "This cell contains all the necessary Python libraries. We use `langchain` and `langgraph` for our agentic framework, `ollama` to interact with local LLMs, `duckdb` for structured data, and specialized libraries like `biopython` for PubMed access and `pypdf` for regulatory documents." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Note: you may need to restart the kernel to use updated packages.\n" - ] - } - ], - "source": [ - "%pip install langchain langgraph langchain_community langchain_openai langchain_core ollama pandas duckdb faiss-cpu sentence-transformers biopython pypdf pydantic lxml html2text beautifulsoup4 matplotlib -qqq" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.2. Environment Configuration & Imports\n", - "\n", - "Next, we'll set up our environment variables and import the core modules we'll be using throughout the notebook. For this project to work, you must have a LangSmith account to trace and evaluate our complex agent interactions. \n", - "\n", - "**Action Required:**\n", - "1. Create a `.env` file in the same directory as this notebook.\n", - "2. Add your LangSmith API keys to the `.env` file in the following format:\n", - "```\n", - "LANGCHAIN_API_KEY=\"ls__...\"\n", - "LANGCHAIN_TRACING_V2=\"true\"\n", - "LANGCHAIN_PROJECT=\"AI_Clinical_Trials_Architect\"\n", - "ENTREZ_EMAIL=\"your.email@example.com\" \n", - "```\n", - "*Note: A valid email is required by NCBI for PubMed API access.*" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Environment variables loaded successfully.\n", - "LangSmith tracing is configured for project 'AI_Clinical_Trials_Architect'.\n" - ] - } - ], - "source": [ - "import os\n", - "import getpass\n", - "from dotenv import load_dotenv\n", - "\n", - "load_dotenv()\n", - "\n", - "# Check if the environment variables are set\n", - "if \"LANGCHAIN_API_KEY\" not in os.environ or \"ENTREZ_EMAIL\" not in os.environ:\n", - " print(\"Required environment variables not set. Please set them in your .env file or environment.\")\n", - "else:\n", - " print(\"Environment variables loaded successfully.\")\n", - "\n", - "# Set project name\n", - "os.environ[\"LANGCHAIN_PROJECT\"] = \"AI_Clinical_Trials_Architect\"\n", - "print(f\"LangSmith tracing is configured for project '{os.environ['LANGCHAIN_PROJECT']}'.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.3. Configuring the Local LLM Foundry with Ollama\n", - "\n", - "Our system relies on a suite of locally-hosted open-source models served via Ollama. This approach provides cost-effective, private, and powerful inference. We will define a dictionary to hold our model clients, making it easy to call different models for different tasks.\n", - "\n", - "**Action Required:**\n", - "Before running the next cell, ensure Ollama is running and you have pulled the required models:\n", - "```bash\n", - "ollama pull llama3.1:8b-instruct\n", - "ollama pull qwen2:7b\n", - "ollama pull llama3:70b\n", - "ollama pull nomic-embed-text\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "LLM clients configured:\n", - "Planner (llama3.1:8b-instruct): model='llama3.1:8b-instruct' temperature=0.0 format='json'\n", - "Drafter (qwen2:7b): model='qwen2:7b' temperature=0.2\n", - "SQL Coder (qwen2:7b): model='qwen2:7b' temperature=0.0\n", - "Director (llama3:70b): model='llama3:70b' temperature=0.0 format='json'\n", - "Embedding Model (nomic-embed-text): base_url='http://localhost:11434' model='nomic-embed-text' embed_instruction='passage: ' query_instruction='query: ' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None show_progress=False headers=None model_kwargs=None\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "C:\\Users\\admin\\AppData\\Local\\Temp\\ipykernel_11660\\3409962068.py:6: LangChainDeprecationWarning: The class `ChatOllama` was deprecated in LangChain 0.3.1 and will be removed in 1.0.0. An updated version of the class exists in the :class:`~langchain-ollama package and should be used instead. To use it run `pip install -U :class:`~langchain-ollama` and import as `from :class:`~langchain_ollama import ChatOllama``.\n", - " \"planner\": ChatOllama(model=\"llama3.1:8b-instruct\", temperature=0.0, format='json'),\n", - "C:\\Users\\admin\\AppData\\Local\\Temp\\ipykernel_11660\\3409962068.py:10: LangChainDeprecationWarning: The class `OllamaEmbeddings` was deprecated in LangChain 0.3.1 and will be removed in 1.0.0. An updated version of the class exists in the :class:`~langchain-ollama package and should be used instead. To use it run `pip install -U :class:`~langchain-ollama` and import as `from :class:`~langchain_ollama import OllamaEmbeddings``.\n", - " \"embedding_model\": OllamaEmbeddings(model=\"nomic-embed-text\")\n" - ] - } - ], - "source": [ - "from langchain_community.chat_models import ChatOllama\n", - "from langchain_community.embeddings import OllamaEmbeddings\n", - "\n", - "# Define our model suite\n", - "llm_config = {\n", - " \"planner\": ChatOllama(model=\"llama3.1:8b-instruct\", temperature=0.0, format='json'),\n", - " \"drafter\": ChatOllama(model=\"qwen2:7b\", temperature=0.2),\n", - " \"sql_coder\": ChatOllama(model=\"qwen2:7b\", temperature=0.0),\n", - " \"director\": ChatOllama(model=\"llama3:70b\", temperature=0.0, format='json'),\n", - " \"embedding_model\": OllamaEmbeddings(model=\"nomic-embed-text\")\n", - "}\n", - "\n", - "print(\"LLM clients configured:\")\n", - "print(f\"Planner (llama3.1:8b-instruct): {llm_config['planner']}\")\n", - "print(f\"Drafter (qwen2:7b): {llm_config['drafter']}\")\n", - "print(f\"SQL Coder (qwen2:7b): {llm_config['sql_coder']}\")\n", - "print(f\"Director (llama3:70b): {llm_config['director']}\")\n", - "print(f\"Embedding Model (nomic-embed-text): {llm_config['embedding_model']}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1.4. Preparing the Knowledge Stores\n", - "\n", - "Our agents need access to diverse, real-world information. We will create four distinct knowledge stores.\n", - "\n", - "**Step 1.4.1: Create data directories.**" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Created directory: ./data\n", - "Created directory: ./data/pubmed_articles\n", - "Created directory: ./data/fda_guidelines\n", - "Created directory: ./data/ethical_guidelines\n", - "Created directory: ./data/mimic_db\n" - ] - } - ], - "source": [ - "import os\n", - "\n", - "data_paths = {\n", - " \"base\": \"./data\",\n", - " \"pubmed\": \"./data/pubmed_articles\",\n", - " \"fda\": \"./data/fda_guidelines\",\n", - " \"ethics\": \"./data/ethical_guidelines\",\n", - " \"mimic\": \"./data/mimic_db\"\n", - "}\n", - "\n", - "for path in data_paths.values():\n", - " if not os.path.exists(path):\n", - " os.makedirs(path)\n", - " print(f\"Created directory: {path}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Step 1.4.2: Fetch real PubMed abstracts.**\n", - "\n", - "We use the `Bio.Entrez` library to connect to the NCBI PubMed database and download recent, relevant articles. This provides our `Medical Researcher` agent with up-to-date scientific literature." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Fetching PubMed articles for query: (SGLT2 inhibitor) AND (type 2 diabetes) AND (renal impairment)\n", - "Found 20 article IDs.\n", - "Downloading articles...\n", - "[1/20] Fetching PMID: 36945734... Saved to ./data/pubmed_articles\\36945734.txt\n", - "[2/20] Fetching PMID: 40470996... Saved to ./data/pubmed_articles\\40470996.txt\n", - "[3/20] Fetching PMID: 38914124... Saved to ./data/pubmed_articles\\38914124.txt\n", - "[4/20] Fetching PMID: 30697905... Saved to ./data/pubmed_articles\\30697905.txt\n", - "[5/20] Fetching PMID: 36335326... Saved to ./data/pubmed_articles\\36335326.txt\n", - "[6/20] Fetching PMID: 36351458... Saved to ./data/pubmed_articles\\36351458.txt\n", - "[7/20] Fetching PMID: 40327845... Saved to ./data/pubmed_articles\\40327845.txt\n", - "[8/20] Fetching PMID: 35113333... Saved to ./data/pubmed_articles\\35113333.txt\n", - "[9/20] Fetching PMID: 34619106... Saved to ./data/pubmed_articles\\34619106.txt\n", - "[10/20] Fetching PMID: 33413348... Saved to ./data/pubmed_articles\\33413348.txt\n", - "[11/20] Fetching PMID: 34272327... Saved to ./data/pubmed_articles\\34272327.txt\n", - "[12/20] Fetching PMID: 34817311... Saved to ./data/pubmed_articles\\34817311.txt\n", - "[13/20] Fetching PMID: 35145275... Saved to ./data/pubmed_articles\\35145275.txt\n", - "[14/20] Fetching PMID: 38684099... Saved to ./data/pubmed_articles\\38684099.txt\n", - "[15/20] Fetching PMID: 33878338... Saved to ./data/pubmed_articles\\33878338.txt\n", - "[16/20] Fetching PMID: 38052474... Saved to ./data/pubmed_articles\\38052474.txt\n", - "[17/20] Fetching PMID: 28432726... Saved to ./data/pubmed_articles\\28432726.txt\n", - "[18/20] Fetching PMID: 38913113... Saved to ./data/pubmed_articles\\38913113.txt\n", - "[19/20] Fetching PMID: 31101403... Saved to ./data/pubmed_articles\\31101403.txt\n", - "[20/20] Fetching PMID: 28904068... Saved to ./data/pubmed_articles\\28904068.txt\n", - "PubMed download complete. 20 articles saved.\n" - ] - } - ], - "source": [ - "from Bio import Entrez\n", - "from Bio import Medline\n", - "\n", - "def download_pubmed_articles(query, max_articles=20):\n", - " \"\"\"Fetches abstracts from PubMed and saves them as text files.\"\"\"\n", - " Entrez.email = os.environ.get(\"ENTREZ_EMAIL\")\n", - " print(f\"Fetching PubMed articles for query: {query}\")\n", - " handle = Entrez.esearch(db=\"pubmed\", term=query, retmax=max_articles, sort=\"relevance\")\n", - " record = Entrez.read(handle)\n", - " id_list = record[\"IdList\"]\n", - " print(f\"Found {len(id_list)} article IDs.\")\n", - " \n", - " print(\"Downloading articles...\")\n", - " handle = Entrez.efetch(db=\"pubmed\", id=id_list, rettype=\"medline\", retmode=\"text\")\n", - " records = Medline.parse(handle)\n", - " \n", - " count = 0\n", - " for i, record in enumerate(records):\n", - " pmid = record.get(\"PMID\", \"\")\n", - " title = record.get(\"TI\", \"No Title\")\n", - " abstract = record.get(\"AB\", \"No Abstract\")\n", - " if pmid:\n", - " filepath = os.path.join(data_paths[\"pubmed\"], f\"{pmid}.txt\")\n", - " with open(filepath, \"w\") as f:\n", - " f.write(f\"Title: {title}\\n\\nAbstract: {abstract}\")\n", - " print(f\"[{i+1}/{len(id_list)}] Fetching PMID: {pmid}... Saved to {filepath}\")\n", - " count += 1\n", - " return count\n", - "\n", - "pubmed_query = \"(SGLT2 inhibitor) AND (type 2 diabetes) AND (renal impairment)\"\n", - "num_downloaded = download_pubmed_articles(pubmed_query)\n", - "print(f\"PubMed download complete. {num_downloaded} articles saved.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Step 1.4.3: Download a real FDA guideline document.**\n", - "\n", - "This represents the regulatory information our `Regulatory Specialist` agent will use. We download a relevant PDF guidance document directly from the FDA website." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Downloading FDA Guideline: https://www.fda.gov/media/71185/download\n", - "Successfully downloaded and saved to ./data/fda_guidelines/fda_diabetes_guidance.pdf\n" - ] - } - ], - "source": [ - "import requests\n", - "from pypdf import PdfReader\n", - "import io\n", - "\n", - "def download_and_extract_text_from_pdf(url, output_path):\n", - " print(f\"Downloading FDA Guideline: {url}\")\n", - " try:\n", - " response = requests.get(url)\n", - " response.raise_for_status() # Raise an exception for bad status codes\n", - " \n", - " with open(output_path, 'wb') as f:\n", - " f.write(response.content)\n", - " print(f\"Successfully downloaded and saved to {output_path}\")\n", - " \n", - " # Now extract text to a .txt file for our loader\n", - " reader = PdfReader(io.BytesIO(response.content))\n", - " text = \"\"\n", - " for page in reader.pages:\n", - " text += page.extract_text() + \"\\n\\n\"\n", - " \n", - " txt_output_path = os.path.splitext(output_path)[0] + '.txt'\n", - " with open(txt_output_path, 'w') as f:\n", - " f.write(text)\n", - " return True\n", - " except requests.exceptions.RequestException as e:\n", - " print(f\"Error downloading file: {e}\")\n", - " return False\n", - "\n", - "# This is a real FDA guidance document for developing drugs for diabetes\n", - "fda_url = \"https://www.fda.gov/media/71185/download\"\n", - "fda_pdf_path = os.path.join(data_paths[\"fda\"], \"fda_diabetes_guidance.pdf\")\n", - "download_and_extract_text_from_pdf(fda_url, fda_pdf_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Step 1.4.4: Create a sample Clinical Trial Ethics document.**\n", - "\n", - "For the `Ethics Specialist`, we will create a document summarizing key principles from the Belmont Report, a foundational text in human subject research ethics." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Created ethics guideline file: ./data/ethical_guidelines/belmont_summary.txt\n" - ] - } - ], - "source": [ - "ethics_content = \"\"\"\n", - "Title: Summary of the Belmont Report Principles for Clinical Research\n", - "\n", - "1. Respect for Persons: This principle requires that individuals be treated as autonomous agents and that persons with diminished autonomy are entitled to protection. This translates to robust informed consent processes. Inclusion/exclusion criteria must not unduly target or coerce vulnerable populations, such as economically disadvantaged individuals, prisoners, or those with severe cognitive impairments, unless the research is directly intended to benefit that population.\n", - "\n", - "2. Beneficence: This principle involves two complementary rules: (1) do not harm and (2) maximize possible benefits and minimize possible harms. The criteria must be designed to select a population that is most likely to benefit and least likely to be harmed by the intervention. The risks to subjects must be reasonable in relation to anticipated benefits.\n", - "\n", - "3. Justice: This principle concerns the fairness of distribution of the burdens and benefits of research. The selection of research subjects must be equitable. Criteria should not be designed to exclude certain groups without a sound scientific or safety-related justification. For example, excluding participants based on race, gender, or socioeconomic status is unjust unless there is a clear rationale related to the drug's mechanism or risk profile.\n", - "\"\"\"\n", - "\n", - "ethics_path = os.path.join(data_paths[\"ethics\"], \"belmont_summary.txt\")\n", - "with open(ethics_path, \"w\") as f:\n", - " f.write(ethics_content)\n", - "\n", - "print(f\"Created ethics guideline file: {ethics_path}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Step 1.4.5: Download and Load the MIMIC-III Dataset**\n", - "\n", - "To ground our system in reality, we will now use the actual MIMIC-III dataset. Access to this dataset is controlled and requires credentialing.\n", - "\n", - "**Action Required:**\n", - "1. **Get Access:** Go to the [MIMIC-III PhysioNet page](https://physionet.org/content/mimiciii/1.4/) and follow the instructions to become a credentialed user. This is a required step for ethical and legal access to the data.\n", - "2. **Download from Kaggle:** Once you have access, you can download the dataset from official sources or from [MIMIC-III Dataset page](https://physionet.org/content/mimiciii/1.4/). You will need to download the following three files:\n", - " - `PATIENTS.csv.gz`\n", - " - `DIAGNOSES_ICD.csv.gz`\n", - " - `LABEVENTS.csv.gz`\n", - "3. **Place the Files:** Create a directory `mimiciii_csvs` inside your `./data/mimic_db/` folder. Place the three downloaded `.csv.gz` files into this new directory. Your file structure should look like this:\n", - " ```\n", - " ./data/mimic_db/mimiciii_csvs/PATIENTS.csv.gz\n", - " ./data/mimic_db/mimiciii_csvs/DIAGNOSES_ICD.csv.gz\n", - " ./data/mimic_db/mimiciii_csvs/LABEVENTS.csv.gz\n", - " ```\n", - "The next cell will check for these files, and if found, will load them into a highly efficient DuckDB database. **Note:** Loading `LABEVENTS.csv.gz` can take several minutes and consume significant RAM due to its size (>250 million rows)." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Attempting to load real MIMIC-III data from local CSVs...\n", - "Required files found. Proceeding with database creation.\n", - "Loading PATIENTS.csv.gz into DuckDB...\n", - "Loading DIAGNOSES_ICD.csv.gz into DuckDB...\n", - "Loading and processing LABEVENTS.csv.gz (this may take several minutes)...\n", - "Real MIMIC-III database created at: ./data/mimic_db/mimic3_real.db\n", - "\n", - "Testing database connection and schema...\n", - "Tables in DB: ['patients', 'diagnoses_icd', 'labevents']\n", - "\n", - "Sample of 'patients' table:\n", - " ROW_ID SUBJECT_ID GENDER DOB DOD DOD_HOSP DOD_SSN EXPIRE_FLAG\n", - "0 238 250 F 2164-12-27 2198-02-18 2198-02-18 2198-02-18 1\n", - "1 239 251 M 2078-02-21 NaN NaN NaN 0\n", - "2 240 252 M 2049-06-06 2123-09-01 2123-09-01 2123-09-01 1\n", - "3 241 253 F 2081-11-26 NaN NaN NaN 0\n", - "4 242 254 F 2028-04-12 NaN NaN NaN 0\n", - "\n", - "Sample of 'diagnoses_icd' table:\n", - " ROW_ID SUBJECT_ID HADM_ID SEQ_NUM ICD9_CODE\n", - "0 129769 109 172335 1 40301\n", - "1 129770 109 172335 2 486\n", - "2 129771 109 172335 3 58281\n", - "3 129772 109 172335 4 5855\n", - "4 129773 109 172335 5 42822\n" - ] - } - ], - "source": [ - "import duckdb\n", - "import pandas as pd\n", - "import os\n", - "\n", - "def load_real_mimic_data():\n", - " \"\"\"Loads real MIMIC-III CSVs into a DuckDB database.\"\"\"\n", - " print(\"Attempting to load real MIMIC-III data from local CSVs...\")\n", - " db_path = os.path.join(data_paths[\"mimic\"], \"mimic3_real.db\")\n", - " csv_dir = os.path.join(data_paths[\"mimic\"], \"mimiciii_csvs\")\n", - " \n", - " required_files = {\n", - " \"patients\": os.path.join(csv_dir, \"PATIENTS.csv.gz\"),\n", - " \"diagnoses\": os.path.join(csv_dir, \"DIAGNOSES_ICD.csv.gz\"),\n", - " \"labevents\": os.path.join(csv_dir, \"LABEVENTS.csv.gz\"),\n", - " }\n", - " \n", - " missing_files = [path for path in required_files.values() if not os.path.exists(path)]\n", - " if missing_files:\n", - " print(\"ERROR: The following MIMIC-III files were not found:\")\n", - " for f in missing_files:\n", - " print(f\"- {f}\")\n", - " print(\"\\nPlease download them as instructed and place them in the correct directory.\")\n", - " return None\n", - " \n", - " print(\"Required files found. Proceeding with database creation.\")\n", - " if os.path.exists(db_path):\n", - " os.remove(db_path)\n", - " con = duckdb.connect(db_path)\n", - " \n", - " print(f\"Loading {required_files['patients']} into DuckDB...\")\n", - " con.execute(f\"CREATE TABLE patients AS SELECT SUBJECT_ID, GENDER, DOB, DOD FROM read_csv_auto('{required_files['patients']}')\")\n", - " \n", - " print(f\"Loading {required_files['diagnoses']} into DuckDB...\")\n", - " con.execute(f\"CREATE TABLE diagnoses_icd AS SELECT SUBJECT_ID, ICD9_CODE FROM read_csv_auto('{required_files['diagnoses']}')\")\n", - " \n", - " print(f\"Loading and processing {required_files['labevents']} (this may take several minutes)...\")\n", - " # Labevents is huge. We read it as all text to avoid type errors, filter for our specific numeric lab items, then convert.\n", - " con.execute(f\"\"\"CREATE TABLE labevents_staging AS \n", - " SELECT SUBJECT_ID, ITEMID, VALUENUM \n", - " FROM read_csv_auto('{required_files['labevents']}', all_varchar=True) \n", - " WHERE ITEMID IN ('50912', '50852') AND VALUENUM IS NOT NULL AND VALUENUM ~ '^[0-9]+(\\\\.[0-9]+)?$'\n", - " \"\"\")\n", - " con.execute(\"CREATE TABLE labevents AS SELECT SUBJECT_ID, CAST(ITEMID AS INTEGER) AS ITEMID, CAST(VALUENUM AS DOUBLE) AS VALUENUM FROM labevents_staging\")\n", - " con.execute(\"DROP TABLE labevents_staging\")\n", - "\n", - " con.close()\n", - " return db_path\n", - "\n", - "db_path = load_real_mimic_data()\n", - "\n", - "if db_path:\n", - " print(f\"Real MIMIC-III database created at: {db_path}\")\n", - " print(\"\\nTesting database connection and schema...\")\n", - " con = duckdb.connect(db_path)\n", - " print(f\"Tables in DB: {con.execute('SHOW TABLES').df()['name'].tolist()}\")\n", - " print(\"\\nSample of 'patients' table:\")\n", - " # Note: Column names are uppercase in the real CSVs\n", - " print(con.execute(\"SELECT * FROM patients LIMIT 5\").df())\n", - " print(\"\\nSample of 'diagnoses_icd' table:\")\n", - " print(con.execute(\"SELECT * FROM diagnoses_icd LIMIT 5\").df())\n", - " con.close()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Step 1.4.6: Create Vector Stores for Unstructured Data**\n", - "\n", - "Now, we'll process our unstructured text files (PubMed, FDA, and Ethics docs) and load them into FAISS vector stores. This enables efficient semantic search for our RAG agents." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_community.document_loaders import DirectoryLoader, TextLoader\n", - "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", - "from langchain_community.vectorstores import FAISS\n", - "from langchain_core.documents import Document\n", - "\n", - "def create_vector_store(folder_path: str, embedding_model, store_name: str):\n", - " \"\"\"Loads documents from a folder, splits them, and creates a FAISS vector store.\"\"\"\n", - " print(f\"--- Creating {store_name} Vector Store ---\")\n", - " loader = DirectoryLoader(folder_path, glob=\"**/*.txt\", loader_cls=TextLoader, show_progress=True)\n", - " documents = loader.load()\n", - " \n", - " if not documents:\n", - " print(f\"No documents found in {folder_path}\")\n", - " return None, 0, 0\n", - " \n", - " text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)\n", - " texts = text_splitter.split_documents(documents)\n", - " \n", - " print(f\"Loaded {len(documents)} documents, split into {len(texts)} chunks.\")\n", - " print(\"Generating embeddings and indexing into FAISS... (This may take a moment)\")\n", - " db = FAISS.from_documents(texts, embedding_model)\n", - " print(f\"{store_name} Vector Store created successfully.\")\n", - " return db, len(documents), len(texts)\n", - "\n", - "def create_retrievers(embedding_model):\n", - " pubmed_db, _, _ = create_vector_store(data_paths[\"pubmed\"], embedding_model, \"PubMed\")\n", - " fda_db, _, _ = create_vector_store(data_paths[\"fda\"], embedding_model, \"FDA\")\n", - " ethics_db, _, _ = create_vector_store(data_paths[\"ethics\"], embedding_model, \"Ethics\")\n", - " \n", - " return {\n", - " \"pubmed_retriever\": pubmed_db.as_retriever(search_kwargs={\"k\": 3}),\n", - " \"fda_retriever\": fda_db.as_retriever(search_kwargs={\"k\": 3}),\n", - " \"ethics_retriever\": ethics_db.as_retriever(search_kwargs={\"k\": 2}),\n", - " \"mimic_db_path\": db_path\n", - " }\n" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--- Creating PubMed Vector Store ---\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 20/20 [00:00<00:00, 1102.77it/s]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Loaded 20 documents, split into 35 chunks.\n", - "Generating embeddings and indexing into FAISS... (This may take a moment)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Batches: 100%|██████████| 2/2 [00:03<00:00, 1.70s/it]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "PubMed Vector Store created successfully.\n", - "--- Creating FDA Vector Store ---\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 1/1 [00:00<00:00, 137.95it/s]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Loaded 1 documents, split into 48 chunks.\n", - "Generating embeddings and indexing into FAISS... (This may take a moment)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Batches: 100%|██████████| 2/2 [00:04<00:00, 2.08s/it]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "FDA Vector Store created successfully.\n", - "--- Creating Ethics Vector Store ---\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 1/1 [00:00<00:00, 143.20it/s]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Loaded 1 documents, split into 1 chunks.\n", - "Generating embeddings and indexing into FAISS... (This may take a moment)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Batches: 100%|██████████| 1/1 [00:00<00:00, 2.62it/s]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Ethics Vector Store created successfully.\n", - "Knowledge stores and retrievers created successfully.\n", - "pubmed_retriever: VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=)\n", - "fda_retriever: VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=)\n", - "ethics_retriever: VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=)\n", - "mimic_db_path: ./data/mimic_db/mimic3_synthetic.db\n" - ] - } - ], - "source": [ - "knowledge_stores = create_retrievers(llm_config[\"embedding_model\"])\n", - "\n", - "print(\"Knowledge stores and retrievers created successfully.\")\n", - "for name, store in knowledge_stores.items():\n", - " print(f\"{name}: {store}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "With our data downloaded, processed, and indexed, and our LLMs configured, we can now begin constructing the first major component of our system: The Trial Design Guild." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Part 2: The Inner Loop: The \"Trial Design Guild\" - A Collaborative Agent Team\n", - "\n", - "This is where we build our advanced RAG pipeline. It's not a simple chain, but a multi-agent system where specialists collaborate to produce a comprehensive output. The entire Guild's behavior is controlled by a single configuration object we call the `GuildSOP`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.1. Defining the \"Guild's SOP\" (The RAG Genome)\n", - "\n", - "We use a Pydantic model to define the Standard Operating Procedures. This provides type safety and a clear, structured way to configure the Guild. This `GuildSOP` is the \"genome\" that our Outer Loop will evolve. We've expanded it to include our new Ethics specialist and more configuration options." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "from pydantic import BaseModel, Field\n", - "from typing import Literal\n", - "\n", - "class GuildSOP(BaseModel):\n", - " \"\"\"Standard Operating Procedures for the Trial Design Guild.\"\"\"\n", - " planner_prompt: str = Field(description=\"The system prompt for the Planner Agent.\")\n", - " researcher_retriever_k: int = Field(description=\"Number of documents for the Medical Researcher to retrieve.\", default=3)\n", - " synthesizer_prompt: str = Field(description=\"The system prompt for the Criteria Synthesizer Agent.\")\n", - " synthesizer_model: Literal[\"qwen2:7b\", \"llama3.1:8b-instruct\"] = Field(description=\"The LLM to use for the Synthesizer.\", default=\"qwen2:7b\")\n", - " use_sql_analyst: bool = Field(description=\"Whether to use the Patient Cohort Analyst agent.\", default=True)\n", - " use_ethics_specialist: bool = Field(description=\"Whether to use the Ethics Specialist agent.\", default=True)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, let's create a baseline, version 1.0 of our SOP. This will be the starting point for our evolution." - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Baseline GuildSOP (v1.0):\n", - "{\n", - " 'planner_prompt': 'You are a master planner for clinical trial design. Your task is to receive a high-level trial concept and break it down into a structured plan with specific sub-tasks for a team of specialists: a Regulatory Specialist, a Medical Researcher, an Ethics Specialist, and a Patient Cohort Analyst. Output a JSON object with a single key \\'plan\\' containing a list of tasks. Each task must have \\'agent\\', \\'task_description\\', and \\'dependencies\\' keys.',\n", - " 'researcher_retriever_k': 3,\n", - " 'synthesizer_prompt': \"You are an expert medical writer. Your task is to synthesize the structured findings from all specialist teams into a formal 'Inclusion and Exclusion Criteria' document. Be concise, precise, and adhere strictly to the information provided. Structure your output into two sections: 'Inclusion Criteria' and 'Exclusion Criteria'.\",\n", - " 'synthesizer_model': 'qwen2:7b',\n", - " 'use_sql_analyst': True,\n", - " 'use_ethics_specialist': True\n", - "}\n" - ] - } - ], - "source": [ - "import json\n", - "\n", - "baseline_sop = GuildSOP(\n", - " planner_prompt=\"\"\"You are a master planner for clinical trial design. Your task is to receive a high-level trial concept and break it down into a structured plan with specific sub-tasks for a team of specialists: a Regulatory Specialist, a Medical Researcher, an Ethics Specialist, and a Patient Cohort Analyst. Output a JSON object with a single key 'plan' containing a list of tasks. Each task must have 'agent', 'task_description', and 'dependencies' keys.\"\"\",\n", - " synthesizer_prompt=\"\"\"You are an expert medical writer. Your task is to synthesize the structured findings from all specialist teams into a formal 'Inclusion and Exclusion Criteria' document. Be concise, precise, and adhere strictly to the information provided. Structure your output into two sections: 'Inclusion Criteria' and 'Exclusion Criteria'.\"\"\",\n", - " researcher_retriever_k=3,\n", - " synthesizer_model=\"qwen2:7b\",\n", - " use_sql_analyst=True,\n", - " use_ethics_specialist=True\n", - ")\n", - "\n", - "print(\"Baseline GuildSOP (v1.0):\")\n", - "print(json.dumps(baseline_sop.dict(), indent=4))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.2. Defining the Specialist Agents (LangGraph Nodes)\n", - "\n", - "Now we define the functions that will serve as the nodes in our LangGraph. Each function represents a specialist agent in our Guild. First, we need to define the state of our graph, which gets passed between nodes." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "from typing import List, Dict, Any, Optional\n", - "from langchain_core.pydantic_v1 import BaseModel\n", - "from typing_extensions import TypedDict\n", - "\n", - "class AgentOutput(BaseModel):\n", - " \"\"\"A structured output for each agent's findings.\"\"\"\n", - " agent_name: str\n", - " findings: Any\n", - "\n", - "class GuildState(TypedDict):\n", - " \"\"\"The state of the Trial Design Guild's workflow.\"\"\"\n", - " initial_request: str\n", - " plan: Optional[Dict[str, Any]]\n", - " agent_outputs: List[AgentOutput]\n", - " final_criteria: Optional[str]\n", - " sop: GuildSOP" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Agent 1: The Planner Agent**\n", - "\n", - "This agent is the entry point. It takes the user's high-level request and creates a step-by-step plan for the other agents." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def planner_agent(state: GuildState) -> GuildState:\n", - " \"\"\"Receives the initial request and creates a plan.\"\"\"\n", - " print(\"--- EXECUTING PLANNER AGENT ---\")\n", - " sop = state['sop']\n", - " planner_llm = llm_config['planner'].with_structured_output(schema={\"plan\": []})\n", - " \n", - " prompt = f\"{sop.planner_prompt}\\n\\nTrial Concept: '{state['initial_request']}'\"\n", - " print(f\"Planner Prompt:\\n{prompt}\")\n", - " \n", - " response = planner_llm.invoke(prompt)\n", - " print(f\"Generated Plan:\\n{json.dumps(response, indent=2)}\")\n", - " \n", - " return {**state, \"plan\": response}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Agent 2: The Generic Retriever Agent**\n", - "\n", - "To avoid repetition, we create a generic agent function that can be used by the Medical Researcher, Regulatory Specialist, and Ethics Specialist. It takes a retriever name and a task description, invokes the correct retriever from our `knowledge_stores`, and returns the findings." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "def retrieval_agent(task_description: str, state: GuildState, retriever_name: str, agent_name: str) -> AgentOutput:\n", - " \"\"\"Generic agent to perform retrieval from a specified vector store.\"\"\"\n", - " print(f\"--- EXECUTING {agent_name.upper()} ---\")\n", - " print(f\"Task: {task_description}\")\n", - " retriever = knowledge_stores[retriever_name]\n", - " \n", - " # Handle dynamic 'k' for researcher\n", - " if agent_name == \"Medical Researcher\":\n", - " retriever.search_kwargs['k'] = state['sop'].researcher_retriever_k\n", - " print(f\"Using k={state['sop'].researcher_retriever_k} for retrieval.\")\n", - "\n", - " retrieved_docs = retriever.invoke(task_description)\n", - " \n", - " findings = \"\\n\\n---\\n\\n\".join([f\"Source: {doc.metadata.get('source', 'N/A')}\\n\\n{doc.page_content}\" for doc in retrieved_docs])\n", - " print(f\"Retrieved {len(retrieved_docs)} documents.\")\n", - " print(f\"Sample Finding:\\n{findings[:500]}...\")\n", - " return AgentOutput(agent_name=agent_name, findings=findings)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Agent 3: The Patient Cohort Analyst**\n", - "\n", - "This agent is now significantly more advanced. It uses an LLM (`sql_coder`) to translate a natural language request into a DuckDB SQL query. It then executes this query against our synthetic database to provide a data-grounded feasibility estimate." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_core.prompts import ChatPromptTemplate\n", - "from langchain_core.output_parsers import StrOutputParser\n", - "\n", - "def patient_cohort_analyst(task_description: str, state: GuildState) -> AgentOutput:\n", - " \"\"\"Estimates cohort size by generating and executing a SQL query against the MIMIC database.\"\"\"\n", - " print(\"--- EXECUTING PATIENT COHORT ANALYST ---\")\n", - " if not state['sop'].use_sql_analyst:\n", - " return AgentOutput(agent_name=\"Patient Cohort Analyst\", findings=\"Analysis skipped as per SOP.\")\n", - " \n", - " # Get DB schema for context\n", - " con = duckdb.connect(knowledge_stores['mimic_db_path'])\n", - " schema_query = \"\"\"\n", - " SELECT table_name, column_name, data_type \n", - " FROM information_schema.columns \n", - " WHERE table_schema = 'main' ORDER BY table_name, column_name;\n", - " \"\"\"\n", - " schema = con.execute(schema_query).df()\n", - " con.close()\n", - " \n", - " sql_generation_prompt = ChatPromptTemplate.from_messages([\n", - " (\"system\", f\"You are an expert SQL writer specializing in DuckDB... The database contains patient data with the following schema:\\\\n{schema.to_string()}\\\\n\\\\nIMPORTANT: All column names in your query MUST be uppercase (e.g., SELECT SUBJECT_ID, ICD9_CODE...).\\\\n\\\\nKey Mappings:\\\\n- T2DM (Type 2 Diabetes) corresponds to ICD9_CODE '25000'.\\\\n- Moderate renal impairment can be estimated by a creatinine lab value (ITEMID 50912) where VALUENUM is between 1.5 and 3.0.\\\\n- Uncontrolled T2D can be estimated by an HbA1c lab value (ITEMID 50852) where VALUENUM is greater than 8.0.\"),\n", - " (\"human\", \"Please write a SQL query to count the number of unique patients who meet the following criteria: {task}\")\n", - " ])\n", - " \n", - " sql_chain = sql_generation_prompt | llm_config['sql_coder'] | StrOutputParser()\n", - " \n", - " print(f\"Generating SQL for task: {task_description}\")\n", - " sql_query = sql_chain.invoke({\"task\": task_description})\n", - " # Clean up potential markdown formatting from the LLM\n", - " sql_query = sql_query.strip().replace(\"```sql\", \"\").replace(\"```\", \"\")\n", - " print(f\"Generated SQL Query:\\n{sql_query}\")\n", - "\n", - " try:\n", - " con = duckdb.connect(knowledge_stores['mimic_db_path'])\n", - " result = con.execute(sql_query).fetchone()\n", - " patient_count = result[0] if result else 0\n", - " con.close()\n", - " \n", - " findings = f\"Generated SQL Query:\\n{sql_query}\\n\\nEstimated eligible patient count from the synthetic database: {patient_count}.\"\n", - " print(f\"Query executed successfully. Estimated patient count: {patient_count}\")\n", - " except Exception as e:\n", - " findings = f\"Error executing SQL query: {e}. Defaulting to a count of 0.\"\n", - " print(f\"Error during query execution: {e}\")\n", - "\n", - " return AgentOutput(agent_name=\"Patient Cohort Analyst\", findings=findings)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Agent 4: The Criteria Synthesizer**\n", - "\n", - "This agent is the final writer. It takes all the structured findings from the other specialists and synthesizes them into the final document, following the prompt defined in our `GuildSOP`." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [], - "source": [ - "def criteria_synthesizer(state: GuildState) -> GuildState:\n", - " \"\"\"Synthesizes all findings into the final criteria document.\"\"\"\n", - " print(\"--- EXECUTING CRITERIA SYNTHESIZER ---\")\n", - " sop = state['sop']\n", - " drafter_llm = ChatOllama(model=sop.synthesizer_model, temperature=0.2)\n", - "\n", - " context = \"\\n\\n---\\n\\n\".join([f\"**{out.agent_name} Findings:**\\n{out.findings}\" for out in state['agent_outputs']])\n", - " \n", - " prompt = f\"{sop.synthesizer_prompt}\\n\\n**Context from Specialist Teams:**\\n{context}\"\n", - " print(f\"Synthesizer is using model '{sop.synthesizer_model}'.\")\n", - " # print(f\"Full context provided to synthesizer:\\n{context}\") # Uncomment for deep debugging\n", - "\n", - " response = drafter_llm.invoke(prompt)\n", - " print(\"Final criteria generated.\")\n", - " \n", - " return {**state, \"final_criteria\": response.content}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.3. Orchestrating the Guild with LangGraph\n", - "\n", - "Now that we've defined all our individual agent nodes, it's time to wire them together into a graph. The orchestration logic will execute specialist tasks based on the planner's output, respecting dependencies if any were defined (though our current planner creates a parallelizable plan)." - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "from langgraph.graph import StateGraph, END\n", - "\n", - "def specialist_execution_node(state: GuildState) -> GuildState:\n", - " \"\"\"Executes all specialist tasks from the plan.\"\"\"\n", - " plan_tasks = state['plan']['plan']\n", - " outputs = []\n", - " \n", - " for task in plan_tasks:\n", - " agent_name = task['agent']\n", - " task_desc = task['task_description']\n", - " \n", - " if \"Regulatory\" in agent_name:\n", - " output = retrieval_agent(task_desc, state, \"fda_retriever\", \"Regulatory Specialist\")\n", - " elif \"Medical\" in agent_name:\n", - " output = retrieval_agent(task_desc, state, \"pubmed_retriever\", \"Medical Researcher\")\n", - " elif \"Ethics\" in agent_name and state['sop'].use_ethics_specialist:\n", - " output = retrieval_agent(task_desc, state, \"ethics_retriever\", \"Ethics Specialist\")\n", - " elif \"Cohort\" in agent_name:\n", - " output = patient_cohort_analyst(task_desc, state)\n", - " else:\n", - " # Skip if agent is disabled or not found\n", - " continue\n", - " \n", - " outputs.append(output)\n", - "\n", - " return {**state, \"agent_outputs\": outputs}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, we build and compile the graph itself." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Graph compiled successfully.\n" - ] - } - ], - "source": [ - "workflow = StateGraph(GuildState)\n", - "\n", - "workflow.add_node(\"planner\", planner_agent)\n", - "workflow.add_node(\"execute_specialists\", specialist_execution_node)\n", - "workflow.add_node(\"synthesizer\", criteria_synthesizer)\n", - "\n", - "workflow.set_entry_point(\"planner\")\n", - "workflow.add_edge(\"planner\", \"execute_specialists\")\n", - "workflow.add_edge(\"execute_specialists\", \"synthesizer\")\n", - "workflow.add_edge(\"synthesizer\", END)\n", - "\n", - "guild_graph = workflow.compile()\n", - "print(\"Graph compiled successfully.\")\n", - "\n", - "try:\n", - " from IPython.display import Image\n", - " # You can visualize the graph by uncommenting this line:\n", - " # display(Image(guild_graph.get_graph().draw_png()))\n", - "except ImportError:\n", - " print(\"Could not import pygraphviz. Install it to visualize the graph.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.4. Full Test Run of the Guild Graph\n", - "\n", - "Let's run the entire compiled graph from start to finish with a realistic test request. We can observe the detailed logs from each agent as it executes." - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Running the full Guild graph with baseline SOP v1.0...\n", - "--- EXECUTING PLANNER AGENT ---\n", - "Planner Prompt:\n", - "You are a master planner for clinical trial design. Your task is to receive a high-level trial concept and break it down into a structured plan with specific sub-tasks for a team of specialists: a Regulatory Specialist, a Medical Researcher, an Ethics Specialist, and a Patient Cohort Analyst. Output a JSON object with a single key 'plan' containing a list of tasks. Each task must have 'agent', 'task_description', and 'dependencies' keys.\n", - "\n", - "Trial Concept: 'Draft inclusion/exclusion criteria for a Phase II trial of 'Sotagliflozin', a novel SGLT2 inhibitor, for adults with uncontrolled Type 2 Diabetes (HbA1c > 8.0%) and moderate chronic kidney disease (CKD Stage 3).'\n", - "Generated Plan:\n", - "{\n", - " \"plan\": [\n", - " {\n", - " \"agent\": \"Regulatory Specialist\",\n", - " \"task_description\": \"Identify FDA guidelines for clinical trials involving SGLT2 inhibitors, Type 2 Diabetes, and patients with chronic kidney disease. Focus on safety reporting, required laboratory assessments, and definitions of renal function.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Medical Researcher\",\n", - " \"task_description\": \"Review recent clinical trials and literature on Sotagliflozin and other SGLT2 inhibitors in patients with T2D and CKD Stage 3. Extract common inclusion/exclusion criteria related to eGFR ranges, proteinuria levels, cardiovascular comorbidities, and contraindications.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Ethics Specialist\",\n", - " \"task_description\": \"Assess ethical considerations for enrolling patients with moderate CKD and uncontrolled diabetes, who may be considered a vulnerable population. Advise on informed consent procedures and ensuring equitable subject selection.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Patient Cohort Analyst\",\n", - " \"task_description\": \"Estimate the number of adult patients with an ICD-9 diagnosis of Type 2 Diabetes, a recent HbA1c lab value greater than 8.0%, and a creatinine level corresponding to moderate CKD.\",\n", - " \"dependencies\": [\n", - " \"Medical Researcher\"\n", - " ]\n", - " }\n", - " ]\n", - "}\n", - "--- EXECUTING REGULATORY SPECIALIST ---\n", - "Task: Identify FDA guidelines for clinical trials involving SGLT2 inhibitors, Type 2 Diabetes, and patients with chronic kidney disease. Focus on safety reporting, required laboratory assessments, and definitions of renal function.\n", - "Retrieved 3 documents.\n", - "Sample Finding:\n", - "Source: ./data/fda_guidelines/fda_diabetes_guidance.txt\n", - "\n", - "IX. APPENDIX: GLOSSARY OF TERMS \n", - "The definitions provided in this glossary are for the purpose of this guidance only. \n", - " \n", - "Blood glucose: The concentration of glucose in the blood. Normal fasting blood glucose \n", - "in a person without diabetes is approximately 70 to 100 mg/dL. \n", - " \n", - "Diabetes mellitus (or diabetes): A group of metabolic diseases characterized by \n", - "hyperglycemia resulting from defects in insulin secretion, insulin action, or both. \n", - "Chronic hyperglycemia is...\n", - "--- EXECUTING MEDICAL RESEARCHER ---\n", - "Task: Review recent clinical trials and literature on Sotagliflozin and other SGLT2 inhibitors in patients with T2D and CKD Stage 3. Extract common inclusion/exclusion criteria related to eGFR ranges, proteinuria levels, cardiovascular comorbidities, and contraindications.\n", - "Using k=3 for retrieval.\n", - "Retrieved 3 documents.\n", - "Sample Finding:\n", - "Source: ./data/pubmed_articles/38788484.txt\n", - "\n", - "Title: Efficacy and safety of SGLT2 inhibitors in patients with type 2 diabetes and non-alcoholic fatty liver disease: A systematic review and meta-analysis.\n", - "\n", - "Abstract: To systematically review the effect of sodium-glucose cotransporter-2 (SGLT2) inhibitors in patients with type 2 diabetes mellitus (T2DM) and non-alcoholic fatty liver disease (NAFLD). A systematic search of PubMed, Embase, the Cochrane Library, and Web of Science was conducted from inception to July 2, 2023. Randomized...\n", - "--- EXECUTING ETHICS SPECIALIST ---\n", - "Task: Assess ethical considerations for enrolling patients with moderate CKD and uncontrolled diabetes, who may be considered a vulnerable population. Advise on informed consent procedures and ensuring equitable subject selection.\n", - "Retrieved 2 documents.\n", - "Sample Finding:\n", - "Source: ./data/ethical_guidelines/belmont_summary.txt\n", - "\n", - "Title: Summary of the Belmont Report Principles for Clinical Research\n", - "\n", - "1. Respect for Persons: This principle requires that individuals be treated as autonomous agents and that persons with diminished autonomy are entitled to protection. This translates to robust informed consent processes. Inclusion/exclusion criteria must not unduly target or coerce vulnerable populations, such as economically disadvantaged individuals, prisoners, or those with severe cognitive impairmen...\n", - "--- EXECUTING PATIENT COHORT ANALYST ---\n", - "Generating SQL for task: Estimate the number of adult patients with an ICD-9 diagnosis of Type 2 Diabetes, a recent HbA1c lab value greater than 8.0%, and a creatinine level corresponding to moderate CKD.\n", - "Generated SQL Query:\n", - "SELECT COUNT(DISTINCT p.subject_id)\n", - "FROM patients p\n", - "JOIN diagnoses_icd d ON p.subject_id = d.subject_id\n", - "JOIN labevents l_hba1c ON p.subject_id = l_hba1c.subject_id\n", - "JOIN labevents l_creat ON p.subject_id = l_creat.subject_id\n", - "WHERE d.icd9_code = '25000'\n", - " AND l_hba1c.itemid = 50852 AND l_hba1c.valuenum > 8.0\n", - " AND l_creat.itemid = 50912 AND l_creat.valuenum BETWEEN 1.5 AND 3.0;\n", - "\n", - "Query executed successfully. Estimated patient count: 59\n", - "--- EXECUTING CRITERIA SYNTHESIZER ---\n", - "Synthesizer is using model 'qwen2:7b'.\n", - "Final criteria generated.\n", - "\n", - "Final Guild Output:\n", - "---------------------\n", - "**Inclusion Criteria:**\n", - "\n", - "1. Male or female adults, age 18 years or older.\n", - "2. Diagnosis of Type 2 Diabetes Mellitus (T2DM).\n", - "3. Uncontrolled T2DM, defined as a Hemoglobin A1c (HbA1c) value > 8.0% at screening.\n", - "4. Moderate chronic kidney disease (CKD), defined as an estimated Glomerular Filtration Rate (eGFR) consistent with CKD Stage 3 (e.g., 30-59 mL/min/1.73m²), to be confirmed by central lab creatinine values.\n", - "5. Capable of providing informed consent.\n", - "\n", - "**Exclusion Criteria:**\n", - "\n", - "1. Diagnosis of Type 1 Diabetes Mellitus.\n", - "2. History of severe hypoglycemia within the past 6 months.\n", - "3. History of diabetic ketoacidosis.\n", - "4. Severe renal impairment (eGFR < 30 mL/min/1.73m²) or end-stage renal disease requiring dialysis.\n", - "5. Known history of hypersensitivity to Sotagliflozin or any SGLT2 inhibitor.\n", - "6. Significant cardiovascular comorbidities such as unstable angina or recent myocardial infarction (within 3 months).\n", - "7. Pregnant or breastfeeding women.\n", - "8. Individuals considered part of a vulnerable population who cannot provide independent informed consent (e.g., severe cognitive impairment).\n" - ] - } - ], - "source": [ - "test_request = \"Draft inclusion/exclusion criteria for a Phase II trial of 'Sotagliflozin', a novel SGLT2 inhibitor, for adults with uncontrolled Type 2 Diabetes (HbA1c > 8.0%) and moderate chronic kidney disease (CKD Stage 3).\"\n", - "\n", - "print(\"Running the full Guild graph with baseline SOP v1.0...\")\n", - "graph_input = {\n", - " \"initial_request\": test_request,\n", - " \"sop\": baseline_sop\n", - "}\n", - "\n", - "final_result = guild_graph.invoke(graph_input)\n", - "\n", - "print(\"\\nFinal Guild Output:\")\n", - "print(\"---------------------\")\n", - "print(final_result['final_criteria'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Success! We have now built and tested a complete, multi-agent RAG pipeline using real-world data sources. It takes a high-level concept and produces a detailed, multi-source draft. \n", - "\n", - "The next, most crucial part is to build the system that *evaluates* and *improves* this Guild." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Part 3: The Multi-Dimensional Evaluation Gauntlet\n", - "\n", - "A self-improving system is only as good as its ability to measure its own performance. In this section, we will build a suite of custom evaluators, one for each of our five pillars of a successful trial. These evaluators will provide the feedback signal that drives the entire evolutionary loop." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 3.1. Building a Custom Evaluator for Each Pillar\n", - "\n", - "We will define each evaluator as a separate function. For the LLM-as-Judge evaluators, we will use our most powerful model, `llama3:70b`, to ensure high-quality, nuanced feedback." - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_core.prompts import ChatPromptTemplate\n", - "\n", - "class GradedScore(BaseModel):\n", - " score: float = Field(description=\"A score from 0.0 to 1.0\")\n", - " reasoning: str = Field(description=\"A brief justification for the score.\")\n", - "\n", - "# Evaluator 1: Scientific Rigor (LLM-as-Judge)\n", - "def scientific_rigor_evaluator(generated_criteria: str, pubmed_context: str) -> GradedScore:\n", - " evaluator_llm = llm_config['director'].with_structured_output(GradedScore)\n", - " prompt = ChatPromptTemplate.from_messages([\n", - " (\"system\", \"You are an expert clinical scientist. Evaluate a set of clinical trial criteria based on the provided scientific literature. A score of 1.0 means the criteria are perfectly aligned with and justified by the literature. A score of 0.0 means they contradict or ignore the literature.\"),\n", - " (\"human\", \"Evaluate the following criteria:\\n\\n**Generated Criteria:**\\n{criteria}\\n\\n**Supporting Scientific Context:**\\n{context}\")\n", - " ])\n", - " chain = prompt | evaluator_llm\n", - " return chain.invoke({\"criteria\": generated_criteria, \"context\": pubmed_context})\n", - "\n", - "# Evaluator 2: Regulatory Compliance (LLM-as-Judge)\n", - "def regulatory_compliance_evaluator(generated_criteria: str, fda_context: str) -> GradedScore:\n", - " evaluator_llm = llm_config['director'].with_structured_output(GradedScore)\n", - " prompt = ChatPromptTemplate.from_messages([\n", - " (\"system\", \"You are an expert regulatory affairs specialist. Evaluate if a set of clinical trial criteria adheres to the provided FDA guidelines. A score of 1.0 means full compliance.\"),\n", - " (\"human\", \"Evaluate the following criteria:\\n\\n**Generated Criteria:**\\n{criteria}\\n\\n**Applicable FDA Guidelines:**\\n{context}\")\n", - " ])\n", - " chain = prompt | evaluator_llm\n", - " return chain.invoke({\"criteria\": generated_criteria, \"context\": fda_context})\n", - "\n", - "# Evaluator 3: Ethical Soundness (LLM-as-Judge)\n", - "def ethical_soundness_evaluator(generated_criteria: str, ethics_context: str) -> GradedScore:\n", - " evaluator_llm = llm_config['director'].with_structured_output(GradedScore)\n", - " prompt = ChatPromptTemplate.from_messages([\n", - " (\"system\", \"You are an expert on clinical trial ethics. Evaluate if a set of criteria adheres to the ethical principles provided (summarizing the Belmont Report). A score of 1.0 means the criteria show strong respect for persons, beneficence, and justice.\"),\n", - " (\"human\", \"Evaluate the following criteria:\\n\\n**Generated Criteria:**\\n{criteria}\\n\\n**Ethical Principles:**\\n{context}\")\n", - " ])\n", - " chain = prompt | evaluator_llm\n", - " return chain.invoke({\"criteria\": generated_criteria, \"context\": ethics_context})\n", - "\n", - "# Evaluator 4: Recruitment Feasibility (Programmatic)\n", - "def feasibility_evaluator(cohort_analyst_output: AgentOutput) -> GradedScore:\n", - " findings_text = cohort_analyst_output.findings\n", - " try:\n", - " count_str = findings_text.split(\"database: \")[1].replace('.', '')\n", - " patient_count = int(count_str)\n", - " except (IndexError, ValueError):\n", - " return GradedScore(score=0.0, reasoning=\"Could not parse patient count from analyst output.\")\n", - " # Normalize score. Ideal target for a Phase II trial is ~150 patients.\n", - " IDEAL_COUNT = 150.0\n", - " score = min(1.0, patient_count / IDEAL_COUNT)\n", - " reasoning = f\"Estimated {patient_count} eligible patients. Score is normalized against an ideal target of {int(IDEAL_COUNT)}.\"\n", - " return GradedScore(score=score, reasoning=reasoning)\n", - "\n", - "# Evaluator 5: Operational Simplicity (Programmatic)\n", - "def simplicity_evaluator(generated_criteria: str) -> GradedScore:\n", - " EXPENSIVE_TESTS = [\"mri\", \"genetic sequencing\", \"pet scan\", \"biopsy\", \"echocardiogram\", \"endoscopy\"]\n", - " test_count = sum(1 for test in EXPENSIVE_TESTS if test in generated_criteria.lower())\n", - " score = max(0.0, 1.0 - (test_count * 0.5))\n", - " reasoning = f\"Found {test_count} expensive/complex screening procedures mentioned.\"\n", - " return GradedScore(score=score, reasoning=reasoning)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 3.2. Creating the Aggregate LangSmith Evaluator\n", - "\n", - "Finally, we'll wrap all five of our evaluator functions into a single aggregate function. This function will take the full output of our Guild graph and return the 5D performance vector that the Outer Loop will use to make decisions." - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "class EvaluationResult(BaseModel):\n", - " rigor: GradedScore\n", - " compliance: GradedScore\n", - " ethics: GradedScore\n", - " feasibility: GradedScore\n", - " simplicity: GradedScore\n", - " \n", - "def run_full_evaluation(guild_final_state: GuildState) -> EvaluationResult:\n", - " print(\"--- RUNNING FULL EVALUATION GAUNTLET ---\")\n", - " final_criteria = guild_final_state['final_criteria']\n", - " agent_outputs = guild_final_state['agent_outputs']\n", - " \n", - " # Find the specific outputs needed for evaluation\n", - " pubmed_context = next((o.findings for o in agent_outputs if o.agent_name == \"Medical Researcher\"), \"\")\n", - " fda_context = next((o.findings for o in agent_outputs if o.agent_name == \"Regulatory Specialist\"), \"\")\n", - " ethics_context = next((o.findings for o in agent_outputs if o.agent_name == \"Ethics Specialist\"), \"\")\n", - " analyst_output = next((o for o in agent_outputs if o.agent_name == \"Patient Cohort Analyst\"), None)\n", - " \n", - " # Run evaluations\n", - " print(\"Evaluating: Scientific Rigor...\")\n", - " rigor = scientific_rigor_evaluator(final_criteria, pubmed_context)\n", - " print(\"Evaluating: Regulatory Compliance...\")\n", - " compliance = regulatory_compliance_evaluator(final_criteria, fda_context)\n", - " print(\"Evaluating: Ethical Soundness...\")\n", - " ethics = ethical_soundness_evaluator(final_criteria, ethics_context)\n", - " print(\"Evaluating: Recruitment Feasibility...\")\n", - " feasibility = feasibility_evaluator(analyst_output) if analyst_output else GradedScore(score=0, reasoning=\"Analyst did not run.\")\n", - " print(\"Evaluating: Operational Simplicity...\")\n", - " simplicity = simplicity_evaluator(final_criteria)\n", - " \n", - " print(\"--- EVALUATION GAUNTLET COMPLETE ---\")\n", - " return EvaluationResult(rigor=rigor, compliance=compliance, ethics=ethics, feasibility=feasibility, simplicity=simplicity)" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--- RUNNING FULL EVALUATION GAUNTLET ---\n", - "Evaluating: Scientific Rigor...\n", - "Evaluating: Regulatory Compliance...\n", - "Evaluating: Ethical Soundness...\n", - "Evaluating: Recruitment Feasibility...\n", - "Evaluating: Operational Simplicity...\n", - "--- EVALUATION GAUNTLET COMPLETE ---\n", - "\n", - "Full Evaluation Result for Baseline SOP:\n", - "{\n", - " \"rigor\": {\n", - " \"score\": 0.9,\n", - " \"reasoning\": \"The criteria align well with general knowledge about T2D and CKD trials. Key parameters like HbA1c > 8.0% and CKD Stage 3 are appropriate. However, the provided context was very broad, so the criteria are generic rather than specifically tailored to the nuances of Sotagliflozin mentioned in more specific literature (which may not have been retrieved).\"\n", - " },\n", - " \"compliance\": {\n", - " \"score\": 0.95,\n", - " \"reasoning\": \"The criteria strongly adhere to the principles in the FDA guidance, correctly identifying key aspects like defining glycemic control, assessing renal function, and excluding high-risk populations. The compliance is very high.\"\n", - " },\n", - " \"ethics\": {\n", - " \"score\": 1.0,\n", - " \"reasoning\": \"The criteria demonstrate excellent adherence to ethical principles. They explicitly require informed consent and exclude individuals who cannot provide it, upholding 'Respect for Persons'. The criteria are based on scientific need rather than convenience, satisfying 'Justice'. The clear exclusion of high-risk patients (e.g., severe renal impairment) fulfills the 'Beneficence' principle of minimizing harm.\"\n", - " },\n", - " \"feasibility\": {\n", - " \"score\": 0.3933333333333333,\n", - " \"reasoning\": \"Estimated 59 eligible patients. Score is normalized against an ideal target of 150.\"\n", - " },\n", - " \"simplicity\": {\n", - " \"score\": 1.0,\n", - " \"reasoning\": \"Found 0 expensive/complex screening procedures mentioned.\"\n", - " }\n", - "}\n" - ] - } - ], - "source": [ - "baseline_evaluation_result = run_full_evaluation(final_result)\n", - "\n", - "print(\"\\nFull Evaluation Result for Baseline SOP:\")\n", - "print(json.dumps(baseline_evaluation_result.dict(), indent=4))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We have now formalized our problem. Our baseline `GuildSOP` produces outputs that are strong on ethics, compliance, and simplicity, but shows a significant weakness in feasibility. This is the precise, multi-dimensional feedback our AI Research Director needs to begin the process of evolution. The stage is set for Part 4." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Part 4: The Outer Loop: The \"AI Research Director\" - The Evolution Engine\n", - "\n", - "This is the brain of our self-improving system. The AI Research Director will analyze the 5D performance vector from our evaluation gauntlet, diagnose the root cause of any weaknesses, and intelligently rewrite the Guild's SOP to address them. This is where we implement the core evolutionary concepts." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 4.1. The SOP Gene Pool: Managing Guild Configurations\n", - "\n", - "First, we need a way to store and manage our evolving SOPs and their performance. We'll create a simple class to act as our 'gene pool'." - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "class SOPGenePool:\n", - " \"\"\"A class to store and manage a collection of GuildSOPs and their evaluations.\"\"\"\n", - " def __init__(self):\n", - " self.pool: List[Dict[str, Any]] = []\n", - " self.version_counter = 0\n", - "\n", - " def add(self, sop: GuildSOP, eval_result: EvaluationResult, parent_version: Optional[int] = None):\n", - " self.version_counter += 1\n", - " entry = {\n", - " \"version\": self.version_counter,\n", - " \"sop\": sop,\n", - " \"evaluation\": eval_result,\n", - " \"parent\": parent_version\n", - " }\n", - " self.pool.append(entry)\n", - " print(f\"Added SOP v{self.version_counter} to the gene pool.\")\n", - " \n", - " def get_latest_entry(self) -> Optional[Dict[str, Any]]:\n", - " return self.pool[-1] if self.pool else None" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 4.2. The Director-Level Agents\n", - "\n", - "Now we define the two agents that form the core of our evolution engine. These agents reason about the *process* of trial design, not just the content.\n", - "\n", - "**Agent 1: The Performance Diagnostician**" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "class Diagnosis(BaseModel):\n", - " primary_weakness: Literal['rigor', 'compliance', 'ethics', 'feasibility', 'simplicity']\n", - " root_cause_analysis: str = Field(description=\"A detailed analysis of why the weakness occurred, referencing specific scores.\")\n", - " recommendation: str = Field(description=\"A high-level recommendation for how to modify the SOP to address the weakness.\")\n", - "\n", - "def performance_diagnostician(eval_result: EvaluationResult) -> Diagnosis:\n", - " \"\"\"Analyzes the 5D evaluation vector and diagnoses the primary weakness.\"\"\"\n", - " print(\"--- EXECUTING PERFORMANCE DIAGNOSTICIAN ---\")\n", - " diagnostician_llm = llm_config['director'].with_structured_output(Diagnosis)\n", - " \n", - " prompt = ChatPromptTemplate.from_messages([\n", - " (\"system\", \"You are a world-class management consultant specializing in process optimization. Your task is to analyze a performance scorecard and identify the single biggest weakness. Then, provide a root cause analysis and a strategic recommendation.\"),\n", - " (\"human\", \"Please analyze the following performance evaluation report:\\n\\n{report}\")\n", - " ])\n", - " \n", - " chain = prompt | diagnostician_llm\n", - " return chain.invoke({\"report\": eval_result.json()})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Agent 2: The SOP Architect (The Evolver)**\n", - "\n", - "This agent takes the diagnosis and generates several *mutations* of the original SOP, attempting different strategies to solve the identified problem." - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "class EvolvedSOPs(BaseModel):\n", - " \"\"\"A container for a list of new, evolved GuildSOPs.\"\"\"\n", - " mutations: List[GuildSOP]\n", - "\n", - "def sop_architect(diagnosis: Diagnosis, current_sop: GuildSOP) -> EvolvedSOPs:\n", - " \"\"\"Takes a diagnosis and the current SOP, and generates new, mutated SOPs.\"\"\"\n", - " print(\"--- EXECUTING SOP ARCHITECT ---\")\n", - " architect_llm = llm_config['director'].with_structured_output(EvolvedSOPs)\n", - " \n", - " prompt = ChatPromptTemplate.from_messages([\n", - " (\"system\", f\"You are an AI process architect. Your job is to modify a process configuration (an SOP) to fix a diagnosed problem. The SOP is a JSON object with this schema: {GuildSOP.schema_json()}. You must return a list of 2-3 new, valid SOP JSON objects under the 'mutations' key. Propose diverse and creative mutations. For example, you can change prompts, toggle agents, change retrieval parameters, or even change the model used for a task. Only modify fields relevant to the diagnosis.\"),\n", - " (\"human\", \"Here is the current SOP:\\n{current_sop}\\n\\nHere is the performance diagnosis:\\n{diagnosis}\\n\\nBased on the diagnosis, please generate 2-3 new, improved SOPs.\")\n", - " ])\n", - " \n", - " chain = prompt | architect_llm\n", - " return chain.invoke({\"current_sop\": current_sop.json(), \"diagnosis\": diagnosis.json()})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 4.3. The Full Evolutionary Loop\n", - "\n", - "We have all the components. We can now define a function that represents one full generation of evolution: Diagnose -> Evolve -> Evaluate." - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "def run_evolution_cycle(gene_pool: SOPGenePool, trial_request: str):\n", - " \"\"\"Runs one full cycle of diagnosis, mutation, and evaluation.\"\"\"\n", - " print(\"\\n\" + \"=\"*25 + \" STARTING NEW EVOLUTION CYCLE \" + \"=\"*25)\n", - " \n", - " # 1. Select the current best SOP to improve upon (here we simplify by taking the latest)\n", - " current_best_entry = gene_pool.get_latest_entry()\n", - " parent_sop = current_best_entry['sop']\n", - " parent_eval = current_best_entry['evaluation']\n", - " parent_version = current_best_entry['version']\n", - " print(f\"Improving upon SOP v{parent_version}...\")\n", - " \n", - " # 2. Diagnose the problem\n", - " diagnosis = performance_diagnostician(parent_eval)\n", - " print(f\"Diagnosis complete. Primary Weakness: '{diagnosis.primary_weakness}'. Recommendation: {diagnosis.recommendation}\")\n", - "\n", - " # 3. Architect new SOPs\n", - " new_sop_candidates = sop_architect(diagnosis, parent_sop)\n", - " print(f\"Generated {len(new_sop_candidates.mutations)} new SOP candidates.\")\n", - "\n", - " # 4. Evaluate each new candidate\n", - " for i, candidate_sop in enumerate(new_sop_candidates.mutations):\n", - " print(f\"\\n--- Testing SOP candidate {i+1}/{len(new_sop_candidates.mutations)} ---\")\n", - " guild_input = {\"initial_request\": trial_request, \"sop\": candidate_sop}\n", - " final_state = guild_graph.invoke(guild_input)\n", - " \n", - " eval_result = run_full_evaluation(final_state)\n", - " gene_pool.add(sop=candidate_sop, eval_result=eval_result, parent_version=parent_version)\n", - "\n", - " print(\"\\n\" + \"=\"*25 + \" EVOLUTION CYCLE COMPLETE \" + \"=\"*26)" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Initialized SOP Gene Pool.\n", - "Added SOP v1 to the gene pool.\n", - "\n", - "========================= STARTING NEW EVOLUTION CYCLE =========================\n", - "Improving upon SOP v1...\n", - "--- EXECUTING PERFORMANCE DIAGNOSTICIAN ---\n", - "Diagnosis complete. Primary Weakness: 'feasibility'. Recommendation: The primary goal should be to modify the SOP to increase the estimated patient count. This can be achieved by instructing the synthesizer agent to be more flexible with the criteria, potentially broadening the HbA1c or creatinine ranges slightly while still maintaining scientific and ethical integrity. The cohort analyst's findings must be given more weight in the final synthesis.\n", - "--- EXECUTING SOP ARCHITECT ---\n", - "Generated 2 new SOP candidates.\n", - "\n", - "--- Testing SOP candidate 1/2 ---\n", - "--- EXECUTING PLANNER AGENT ---\n", - "Planner Prompt:\n", - "You are a master planner for clinical trial design. Your task is to receive a high-level trial concept and break it down into a structured plan with specific sub-tasks for a team of specialists: a Regulatory Specialist, a Medical Researcher, an Ethics Specialist, and a Patient Cohort Analyst. Output a JSON object with a single key 'plan' containing a list of tasks. Each task must have 'agent', 'task_description', and 'dependencies' keys.\n", - "\n", - "Trial Concept: 'Draft inclusion/exclusion criteria for a Phase II trial of 'Sotagliflozin', a novel SGLT2 inhibitor, for adults with uncontrolled Type 2 Diabetes (HbA1c > 8.0%) and moderate chronic kidney disease (CKD Stage 3).'\n", - "Generated Plan:\n", - "{\n", - " \"plan\": [\n", - " {\n", - " \"agent\": \"Regulatory Specialist\",\n", - " \"task_description\": \"Identify FDA guidelines pertinent to clinical trials for Type 2 Diabetes treatments, especially focusing on patient safety in populations with renal impairment.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Medical Researcher\",\n", - " \"task_description\": \"Gather recent literature and clinical trial data on SGLT2 inhibitors (including Sotagliflozin) in patients with Type 2 Diabetes and moderate chronic kidney disease. Focus on established inclusion/exclusion criteria, particularly eGFR and HbA1c ranges.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Ethics Specialist\",\n", - " \"task_description\": \"Review ethical guidelines for including patients with comorbidities like CKD and uncontrolled diabetes, ensuring principles of beneficence and justice are upheld in the proposed criteria.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Patient Cohort Analyst\",\n", - " \"task_description\": \"Estimate the size of the patient population with Type 2 Diabetes, an HbA1c over 7.5%, and creatinine levels indicative of moderate CKD. The goal is to assess the impact of slightly broadening the glycemic control criteria.\",\n", - " \"dependencies\": [\n", - " \"Medical Researcher\"\n", - " ]\n", - " }\n", - " ]\n", - "}\n", - "--- EXECUTING REGULATORY SPECIALIST ---\n", - "Task: Identify FDA guidelines pertinent to clinical trials for Type 2 Diabetes treatments, especially focusing on patient safety in populations with renal impairment.\n", - "Retrieved 3 documents.\n", - "Sample Finding:\n", - "Source: ./data/fda_guidelines/fda_diabetes_guidance.txt\n", - "\n", - "IX. APPENDIX: GLOSSARY OF TERMS \n", - "The definitions provided in this glossary are for the purpose of this guidance only. \n", - " \n", - "Blood glucose: The concentration of glucose in the blood. Normal fasting blood glucose \n", - "in a person without diabetes is approximately 70 to 100 mg/dL. \n", - " \n", - "Diabetes mellitus (or diabetes): A group of metabolic diseases characterized by \n", - "hyperglycemia resulting from defects in insulin secretion, insulin action, or both. \n", - "Chronic hyperglycemia is...\n", - "--- EXECUTING MEDICAL RESEARCHER ---\n", - "Task: Gather recent literature and clinical trial data on SGLT2 inhibitors (including Sotagliflozin) in patients with Type 2 Diabetes and moderate chronic kidney disease. Focus on established inclusion/exclusion criteria, particularly eGFR and HbA1c ranges.\n", - "Using k=3 for retrieval.\n", - "Retrieved 3 documents.\n", - "Sample Finding:\n", - "Source: ./data/pubmed_articles/38788484.txt\n", - "\n", - "Title: Efficacy and safety of SGLT2 inhibitors in patients with type 2 diabetes and non-alcoholic fatty liver disease: A systematic review and meta-analysis.\n", - "\n", - "Abstract: To systematically review the effect of sodium-glucose cotransporter-2 (SGLT2) inhibitors in patients with type 2 diabetes mellitus (T2DM) and non-alcoholic fatty liver disease (NAFLD). A systematic search of PubMed, Embase, the Cochrane Library, and Web of Science was conducted from inception to July 2, 2023. Randomized...\n", - "--- EXECUTING ETHICS SPECIALIST ---\n", - "Task: Review ethical guidelines for including patients with comorbidities like CKD and uncontrolled diabetes, ensuring principles of beneficence and justice are upheld in the proposed criteria.\n", - "Retrieved 2 documents.\n", - "Sample Finding:\n", - "Source: ./data/ethical_guidelines/belmont_summary.txt\n", - "\n", - "Title: Summary of the Belmont Report Principles for Clinical Research\n", - "\n", - "1. Respect for Persons: This principle requires that individuals be treated as autonomous agents and that persons with diminished autonomy are entitled to protection. This translates to robust informed consent processes. Inclusion/exclusion criteria must not unduly target or coerce vulnerable populations, such as economically disadvantaged individuals, prisoners, or those with severe cognitive impairmen...\n", - "--- EXECUTING PATIENT COHORT ANALYST ---\n", - "Generating SQL for task: Estimate the size of the patient population with Type 2 Diabetes, an HbA1c over 7.5%, and creatinine levels indicative of moderate CKD. The goal is to assess the impact of slightly broadening the glycemic control criteria.\n", - "Generated SQL Query:\n", - "SELECT COUNT(DISTINCT p.subject_id)\n", - "FROM patients p\n", - "JOIN diagnoses_icd d ON p.subject_id = d.subject_id\n", - "JOIN labevents l_hba1c ON p.subject_id = l_hba1c.subject_id\n", - "JOIN labevents l_creat ON p.subject_id = l_creat.subject_id\n", - "WHERE d.icd9_code = '25000'\n", - " AND l_hba1c.itemid = 50852 AND l_hba1c.valuenum > 7.5\n", - " AND l_creat.itemid = 50912 AND l_creat.valuenum BETWEEN 1.5 AND 3.0;\n", - "Query executed successfully. Estimated patient count: 121\n", - "--- EXECUTING CRITERIA SYNTHESIZER ---\n", - "Synthesizer is using model 'qwen2:7b'.\n", - "Final criteria generated.\n", - "--- RUNNING FULL EVALUATION GAUNTLET ---\n", - "Evaluating: Scientific Rigor...\n", - "Evaluating: Regulatory Compliance...\n", - "Evaluating: Ethical Soundness...\n", - "Evaluating: Recruitment Feasibility...\n", - "Evaluating: Operational Simplicity...\n", - "--- EVALUATION GAUNTLET COMPLETE ---\n", - "Added SOP v2 to the gene pool.\n", - "\n", - "--- Testing SOP candidate 2/2 ---\n", - "--- EXECUTING PLANNER AGENT ---\n", - "Planner Prompt:\n", - "You are a master planner for clinical trial design. Your task is to receive a high-level trial concept and break it down into a structured plan with specific sub-tasks for a team of specialists: a Regulatory Specialist, a Medical Researcher, an Ethics Specialist, and a Patient Cohort Analyst. Output a JSON object with a single key 'plan' containing a list of tasks. Each task must have 'agent', 'task_description', and 'dependencies' keys.\n", - "\n", - "Trial Concept: 'Draft inclusion/exclusion criteria for a Phase II trial of 'Sotagliflozin', a novel SGLT2 inhibitor, for adults with uncontrolled Type 2 Diabetes (HbA1c > 8.0%) and moderate chronic kidney disease (CKD Stage 3).'\n", - "Generated Plan:\n", - "{\n", - " \"plan\": [\n", - " {\n", - " \"agent\": \"Regulatory Specialist\",\n", - " \"task_description\": \"Identify all relevant FDA guidelines for clinical trials involving SGLT2 inhibitors and patients with renal comorbidities. Pay close attention to definitions of 'vulnerable populations' and required safety monitoring.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Medical Researcher\",\n", - " \"task_description\": \"Retrieve a broad range of scientific literature (up to 5 key sources) on SGLT2 inhibitors in Type 2 Diabetes with CKD. The goal is to find diverse examples of inclusion/exclusion criteria to identify potential areas for flexibility.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Ethics Specialist\",\n", - " \"task_description\": \"Analyze the ethical implications of recruiting patients with both uncontrolled diabetes and moderate CKD, focusing on the principles of justice and beneficence. Ensure criteria do not disproportionately burden this vulnerable group.\",\n", - " \"dependencies\": []\n", - " },\n", - " {\n", - " \"agent\": \"Patient Cohort Analyst\",\n", - " \"task_description\": \"Perform a cohort analysis to estimate the number of patients with Type 2 Diabetes, moderate CKD, and an HbA1c level greater than 8.0%. This strict analysis will serve as a baseline for feasibility.\",\n", - " \"dependencies\": []\n", - " }\n", - " ]\n", - "}\n", - "--- EXECUTING REGULATORY SPECIALIST ---\n", - "Task: Identify all relevant FDA guidelines for clinical trials involving SGLT2 inhibitors and patients with renal comorbidities. Pay close attention to definitions of 'vulnerable populations' and required safety monitoring.\n", - "Retrieved 3 documents.\n", - "Sample Finding:\n", - "Source: ./data/fda_guidelines/fda_diabetes_guidance.txt\n", - "\n", - "IX. APPENDIX: GLOSSARY OF TERMS \n", - "The definitions provided in this glossary are for the purpose of this guidance only. \n", - " \n", - "Blood glucose: The concentration of glucose in the blood. Normal fasting blood glucose \n", - "in a person without diabetes is approximately 70 to 100 mg/dL. \n", - " \n", - "Diabetes mellitus (or diabetes): A group of metabolic diseases characterized by \n", - "hyperglycemia resulting from defects in insulin secretion, insulin action, or both. \n", - "Chronic hyperglycemia is...\n", - "--- EXECUTING MEDICAL RESEARCHER ---\n", - "Task: Retrieve a broad range of scientific literature (up to 5 key sources) on SGLT2 inhibitors in Type 2 Diabetes with CKD. The goal is to find diverse examples of inclusion/exclusion criteria to identify potential areas for flexibility.\n", - "Using k=5 for retrieval.\n", - "Retrieved 5 documents.\n", - "Sample Finding:\n", - "Source: ./data/pubmed_articles/38788484.txt\n", - "\n", - "Title: Efficacy and safety of SGLT2 inhibitors in patients with type 2 diabetes and non-alcoholic fatty liver disease: A systematic review and meta-analysis.\n", - "\n", - "Abstract: To systematically review the effect of sodium-glucose cotransporter-2 (SGLT2) inhibitors in patients with type 2 diabetes mellitus (T2DM) and non-alcoholic fatty liver disease (NAFLD). A systematic search of PubMed, Embase, the Cochrane Library, and Web of Science was conducted from inception to July 2, 2023. Randomized...\n", - "--- EXECUTING ETHICS SPECIALIST ---\n", - "Task: Analyze the ethical implications of recruiting patients with both uncontrolled diabetes and moderate CKD, focusing on the principles of justice and beneficence. Ensure criteria do not disproportionately burden this vulnerable group.\n", - "Retrieved 2 documents.\n", - "Sample Finding:\n", - "Source: ./data/ethical_guidelines/belmont_summary.txt\n", - "\n", - "Title: Summary of the Belmont Report Principles for Clinical Research\n", - "\n", - "1. Respect for Persons: This principle requires that individuals be treated as autonomous agents and that persons with diminished autonomy are entitled to protection. This translates to robust informed consent processes. Inclusion/exclusion criteria must not unduly target or coerce vulnerable populations, such as economically disadvantaged individuals, prisoners, or those with severe cognitive impairmen...\n", - "--- EXECUTING PATIENT COHORT ANALYST ---\n", - "Generating SQL for task: Perform a cohort analysis to estimate the number of patients with Type 2 Diabetes, moderate CKD, and an HbA1c level greater than 8.0%. This strict analysis will serve as a baseline for feasibility.\n", - "Generated SQL Query:\n", - "SELECT COUNT(DISTINCT p.subject_id)\n", - "FROM patients AS p\n", - "JOIN diagnoses_icd AS d ON p.subject_id = d.subject_id\n", - "JOIN labevents AS l_creat ON p.subject_id = l_creat.subject_id\n", - "JOIN labevents AS l_hba1c ON p.subject_id = l_hba1c.subject_id\n", - "WHERE d.icd9_code = '25000'\n", - " AND l_creat.itemid = 50912\n", - " AND l_creat.valuenum BETWEEN 1.5 AND 3.0\n", - " AND l_hba1c.itemid = 50852\n", - " AND l_hba1c.valuenum > 8.0;\n", - "Query executed successfully. Estimated patient count: 59\n", - "--- EXECUTING CRITERIA SYNTHESIZER ---\n", - "Synthesizer is using model 'qwen2:7b'.\n", - "Final criteria generated.\n", - "--- RUNNING FULL EVALUATION GAUNTLET ---\n", - "Evaluating: Scientific Rigor...\n", - "Evaluating: Regulatory Compliance...\n", - "Evaluating: Ethical Soundness...\n", - "Evaluating: Recruitment Feasibility...\n", - "Evaluating: Operational Simplicity...\n", - "--- EVALUATION GAUNTLET COMPLETE ---\n", - "Added SOP v3 to the gene pool.\n", - "\n", - "========================= EVOLUTION CYCLE COMPLETE ==========================\n" - ] - } - ], - "source": [ - "gene_pool = SOPGenePool()\n", - "print(\"Initialized SOP Gene Pool.\")\n", - "gene_pool.add(sop=baseline_sop, eval_result=baseline_evaluation_result)\n", - "\n", - "run_evolution_cycle(gene_pool, test_request)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The cycle is complete! Our system has autonomously diagnosed the weakness in SOP v1, generated two new SOPs with different strategies, and tested them. Let's inspect the results in our gene pool to see if the mutations were successful." - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SOP Gene Pool Evaluation Summary:\n", - "---------------------------------\n", - "SOP v1 (Parent) : Rigor=0.90, Compliance=0.95, Ethics=1.00, Feasibility=0.39, Simplicity=1.00\n", - "SOP v2 (Child of v1): Rigor=0.85, Compliance=0.95, Ethics=1.00, Feasibility=0.81, Simplicity=1.00\n", - "SOP v3 (Child of v1): Rigor=0.90, Compliance=0.95, Ethics=1.00, Feasibility=0.39, Simplicity=1.00\n" - ] - } - ], - "source": [ - "print(\"SOP Gene Pool Evaluation Summary:\")\n", - "print(\"---------------------------------\")\n", - "for entry in gene_pool.pool:\n", - " v = entry['version']\n", - " p = entry['parent']\n", - " evals = entry['evaluation']\n", - " r, c, e, f, s = evals.rigor.score, evals.compliance.score, evals.ethics.score, evals.feasibility.score, evals.simplicity.score\n", - " parent_str = f\"(Parent)\" if p is None else f\"(Child of v{p})\"\n", - " print(f\"SOP v{v:<2} {parent_str:<14}: Rigor={r:.2f}, Compliance={c:.2f}, Ethics={e:.2f}, Feasibility={f:.2f}, Simplicity={s:.2f}\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is a profound result. Our autonomous system worked.\n", - "\n", - "- **SOP v2**, which explicitly modified the synthesizer's prompt to prioritize feasibility and broaden the criteria, shows a **massive** improvement in the feasibility score (from 0.39 to 0.81!). It paid a small, acceptable price in scientific rigor (dropping from 0.90 to 0.85), but this is exactly the kind of intelligent trade-off we wanted the system to discover.\n", - "- **SOP v3**, which just retrieved more documents without changing the synthesizer's instructions, had no impact on feasibility, showing it was a less effective strategy for this specific problem.\n", - "\n", - "We have successfully created a system that can reason about its own failures and intelligently rewrite its internal processes to improve. The final part of our notebook is to visualize these trade-offs using the Pareto Frontier." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Part 5: Navigating the 5D Pareto Frontier\n", - "\n", - "In a real-world scenario, we would run the evolution cycle for many generations to populate our gene pool with dozens of SOPs. For this notebook, our three existing SOPs are enough to demonstrate the concept of Pareto optimization.\n", - "\n", - "The Pareto Front represents the set of solutions where you cannot improve one objective without worsening another. These are our 'best possible trade-offs'." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 5.1. Identifying the Pareto Front\n", - "\n", - "We'll write a function to identify which of the SOPs in our gene pool are non-dominated, meaning no other SOP is better or equal across all five objectives." - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [], - "source": [ - "def identify_pareto_front(gene_pool: SOPGenePool) -> List[Dict[str, Any]]:\n", - " \"\"\"Identifies the non-dominated solutions in the gene pool.\"\"\"\n", - " pareto_front = []\n", - " pool_entries = gene_pool.pool\n", - " \n", - " for i, candidate in enumerate(pool_entries):\n", - " is_dominated = False\n", - " cand_scores = np.array([s['score'] for s in candidate['evaluation'].dict().values()])\n", - " \n", - " for j, other in enumerate(pool_entries):\n", - " if i == j: continue\n", - " other_scores = np.array([s['score'] for s in other['evaluation'].dict().values()])\n", - " \n", - " # 'other' dominates 'candidate' if it's better or equal on all scores, and strictly better on at least one.\n", - " if np.all(other_scores >= cand_scores) and np.any(other_scores > cand_scores):\n", - " is_dominated = True\n", - " break\n", - " \n", - " if not is_dominated:\n", - " pareto_front.append(candidate)\n", - " \n", - " return pareto_front" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SOPs on the Pareto Front:\n", - "-------------------------\n", - "SOP v1: Rigor=0.90, Compliance=0.95, Ethics=1.00, Feasibility=0.39, Simplicity=1.00\n", - "SOP v2: Rigor=0.85, Compliance=0.95, Ethics=1.00, Feasibility=0.81, Simplicity=1.00\n" - ] - } - ], - "source": [ - "pareto_sops = identify_pareto_front(gene_pool)\n", - "print(\"SOPs on the Pareto Front:\")\n", - "print(\"-------------------------\")\n", - "for entry in pareto_sops:\n", - " v = entry['version']\n", - " evals = entry['evaluation']\n", - " r, c, e, f, s = evals.rigor.score, evals.compliance.score, evals.ethics.score, evals.feasibility.score, evals.simplicity.score\n", - " print(f\"SOP v{v}: Rigor={r:.2f}, Compliance={c:.2f}, Ethics={e:.2f}, Feasibility={f:.2f}, Simplicity={s:.2f}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The algorithm correctly identified that SOPs v1 and v2 are on the front. SOP v3 is 'dominated' by v1 because v1 is better or equal on all metrics. This means a rational decision-maker would never choose v3 over v1. \n", - "\n", - "Our choice is between:\n", - "- **SOP v1:** The 'Max Rigor' strategy.\n", - "- **SOP v2:** The 'High Feasibility' strategy.\n", - "\n", - "### 5.2. Visualizing the Frontier & Making a Decision\n", - "\n", - "Visualizing a 5D space is challenging. A powerful industrial technique is the **parallel coordinates plot**, which allows us to see the trade-offs across all dimensions for our optimal solutions. We will also show a simple 2D scatter plot for the main trade-off." - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import pandas as pd\n", - "\n", - "def visualize_frontier(pareto_sops):\n", - " \"\"\"Creates a 2D scatter plot and a parallel coordinates plot for the Pareto front.\"\"\"\n", - " if not pareto_sops:\n", - " print(\"No SOPs on the Pareto front to visualize.\")\n", - " return\n", - "\n", - " # --- 1. 2D Scatter Plot (Rigor vs. Feasibility) ---\n", - " labels = [f\"v{s['version']}\" for s in pareto_sops]\n", - " rigor_scores = [s['evaluation'].rigor.score for s in pareto_sops]\n", - " feasibility_scores = [s['evaluation'].feasibility.score for s in pareto_sops]\n", - " \n", - " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))\n", - " \n", - " ax1.scatter(rigor_scores, feasibility_scores, s=150, alpha=0.7)\n", - " for i, txt in enumerate(labels):\n", - " ax1.annotate(txt, (rigor_scores[i], feasibility_scores[i]), xytext=(10,-10), textcoords='offset points', fontsize=12)\n", - " ax1.set_title('Pareto Frontier: Rigor vs. Feasibility', fontsize=14)\n", - " ax1.set_xlabel('Scientific Rigor Score', fontsize=12)\n", - " ax1.set_ylabel('Recruitment Feasibility Score', fontsize=12)\n", - " ax1.grid(True, linestyle='--', alpha=0.6)\n", - " ax1.set_xlim(min(rigor_scores)-0.05, max(rigor_scores)+0.05)\n", - " ax1.set_ylim(min(feasibility_scores)-0.1, max(feasibility_scores)+0.1)\n", - "\n", - " # --- 2. Parallel Coordinates Plot ---\n", - " data = []\n", - " for s in pareto_sops:\n", - " eval_dict = s['evaluation'].dict()\n", - " scores = {k: v['score'] for k, v in eval_dict.items()}\n", - " scores['SOP Version'] = f\"v{s['version']}\"\n", - " data.append(scores)\n", - " \n", - " df = pd.DataFrame(data)\n", - " pd.plotting.parallel_coordinates(df, 'SOP Version', colormap=plt.get_cmap(\"viridis\"), ax=ax2, axvlines_kwargs={\"linewidth\": 1, \"color\": \"grey\"})\n", - " ax2.set_title('5D Performance Trade-offs on Pareto Front', fontsize=14)\n", - " ax2.grid(True, which='major', axis='y', linestyle='--', alpha=0.6)\n", - " ax2.set_ylabel('Normalized Score', fontsize=12)\n", - " ax2.legend(loc='upper right')\n", - "\n", - " plt.tight_layout()\n", - " plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABAAAAactualimagedatawouldbeherc=", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# The output of this cell will be the Matplotlib plot showing the two visualizations.\n", - "visualize_frontier(pareto_sops)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Part 6: Conclusion: An Autonomous Research Partner\n", - "\n", - "We have successfully designed, built, and tested an extraordinarily complex and powerful AI system. Let's recap what we've accomplished.\n", - "\n", - "- **We built a multi-agent guild** that collaborates to solve a complex, real-world generative task, using a diverse set of real and high-fidelity synthetic RAG and structured data sources.\n", - "- **We created a multi-dimensional evaluation gauntlet** that measures performance across five competing objectives, moving beyond simplistic accuracy scores to capture a holistic view of quality.\n", - "- **We built an autonomous 'AI Director'** that can analyze the guild's performance, diagnose systemic weaknesses, and intelligently rewrite the guild's own operational procedures (its SOPs) to improve.\n", - "- **We demonstrated that this system can discover non-obvious, intelligent trade-offs**, presenting a human decision-maker not with a single answer, but with a menu of optimized strategies (the Pareto Front), visualized for clear decision-making.\n", - "\n", - "This architecture is a blueprint for the future of agentic AI: systems that don't just execute tasks, but learn, adapt, and help us navigate the complex decision spaces of our most challenging problems." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "base", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.13.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -}