agent-mcp-sql

No application file

App Files Files Community

ohmygaugh commited on Oct 2, 2025

Commit

86cbe3c

1 Parent(s): 398a370

1st pass at merging the code bases(does not run yet)

Browse files

Files changed (19) hide show

.gitignore +3 -0
Makefile +9 -3
README.md +79 -262
agent/main.py +115 -416
agent/requirements.txt +8 -4
agent/tools.py +150 -0
docker-compose.yml +6 -42
mcp/__init__.py +0 -0
mcp/core/__init__.py +0 -0
mcp/core/config.py +24 -0
mcp/core/database.py +26 -0
mcp/core/discovery.py +98 -0
mcp/core/graph.py +151 -0
mcp/core/intelligence.py +161 -0
mcp/requirements.txt +2 -1
ops/scripts/generate_sample_databases.py +160 -0
ops/scripts/ingest.py +71 -0
postgres/init.sql +0 -28
streamlit/app.py +119 -441

.gitignore CHANGED Viewed

@@ -3,3 +3,6 @@
 /neo4j/data
 /postgres/data
 /neo4j/logs

 /neo4j/data
 /postgres/data
 /neo4j/logs
+.env
+mcp/__pycache__
+mcp/core/__pycache__

Makefile CHANGED Viewed

@@ -1,4 +1,4 @@
-.PHONY: up down logs seed clean health test demo
 # Start all services
 up:
@@ -19,12 +19,19 @@ logs:
 seed:
 	docker-compose exec mcp python /app/ops/scripts/seed.py
 # Clean everything (including volumes)
 clean:
 	docker-compose down -v
 	docker system prune -f
 	@if [ -d "neo4j/data" ]; then rm -rf neo4j/data; fi
-	@if [ -d "postgres/data" ]; then rm -rf postgres/data; fi
 	@if [ -d "frontend/.next" ]; then rm -rf frontend/.next; fi
 	@if [ -d "frontend/node_modules" ]; then rm -rf frontend/node_modules; fi
@@ -32,7 +39,6 @@ clean:
 health:
 	@echo "Checking service health..."
 	@docker-compose exec neo4j cypher-shell -u neo4j -p password "MATCH (n) RETURN count(n) LIMIT 1" > /dev/null 2>&1 && echo "✅ Neo4j: Healthy" || echo "❌ Neo4j: Unhealthy"
-	@docker-compose exec postgres pg_isready -U postgres > /dev/null 2>&1 && echo "✅ PostgreSQL: Healthy" || echo "❌ PostgreSQL: Unhealthy"
 	@curl -s http://localhost:8000/health > /dev/null && echo "✅ MCP Server: Healthy" || echo "❌ MCP Server: Unhealthy"
 	@curl -s http://localhost:3000 > /dev/null && echo "✅ Frontend: Healthy" || echo "❌ Frontend: Unhealthy"
 	@curl -s http://localhost:8501 > /dev/null && echo "✅ Streamlit: Healthy" || echo "❌ Streamlit: Unhealthy"

+.PHONY: up down logs seed seed-db ingest clean health test demo
 # Start all services
 up:
 seed:
 	docker-compose exec mcp python /app/ops/scripts/seed.py
+# Seed the SQLite databases from scratch
+seed-db:
+	python ops/scripts/generate_sample_databases.py
+# Ingest SQLite database schemas into Neo4j
+ingest:
+	docker-compose run --rm mcp python /app/ops/scripts/ingest.py
 # Clean everything (including volumes)
 clean:
 	docker-compose down -v
 	docker system prune -f
 	@if [ -d "neo4j/data" ]; then rm -rf neo4j/data; fi
 	@if [ -d "frontend/.next" ]; then rm -rf frontend/.next; fi
 	@if [ -d "frontend/node_modules" ]; then rm -rf frontend/node_modules; fi
 health:
 	@echo "Checking service health..."
 	@docker-compose exec neo4j cypher-shell -u neo4j -p password "MATCH (n) RETURN count(n) LIMIT 1" > /dev/null 2>&1 && echo "✅ Neo4j: Healthy" || echo "❌ Neo4j: Unhealthy"
 	@curl -s http://localhost:8000/health > /dev/null && echo "✅ MCP Server: Healthy" || echo "❌ MCP Server: Unhealthy"
 	@curl -s http://localhost:3000 > /dev/null && echo "✅ Frontend: Healthy" || echo "❌ Frontend: Unhealthy"
 	@curl -s http://localhost:8501 > /dev/null && echo "✅ Streamlit: Healthy" || echo "❌ Streamlit: Unhealthy"

README.md CHANGED Viewed

@@ -1,74 +1,63 @@
-# Graph-Driven Agentic System MVP
-"Keep your data where it is but we will treat it like a graph for you and solve these problems for you"
 ## Overview
-An intelligent agent system that reads instructions from Neo4j, queries PostgreSQL databases, pauses for human review, and maintains a complete audit trail. The system demonstrates agentic workflow orchestration with human-in-the-loop controls.
 ## Key Features
-🤖 **Autonomous Agent**: Processes instructions sequentially with configurable pause durations
-📊 **Graph Database**: All workflow metadata stored in Neo4j for complete traceability
-🔍 **Natural Language SQL**: Converts questions to SQL using LLM integration
-⏸️ **Human-in-the-Loop**: 5-minute pauses allow instruction editing during execution
-🎯 **Single API Gateway**: All Neo4j operations routed through MCP server
-📈 **Real-time Visualization**: Live workflow progress in browser interface
-🔄 **Complete Audit Trail**: Every action logged with timestamps and relationships
 ## Architecture
 ```
-┌─────────────┐    ┌─────────────┐    ┌─────────────┐
-│   Frontend  │────│ MCP Server  │────│    Neo4j    │
-│  (Next.js)  │    │ (FastAPI)   │    │  (Graph)    │
-└─────────────┘    └─────────────┘    └─────────────┘
-                           │
-                    ┌─────────────┐    ┌─────────────┐
-                    │    Agent    │────│ PostgreSQL  │
-                    │  (Python)   │    │   (Data)    │
-                    └─────────────┘    └─────────────┘
-```
-###### Any MCP applicaiton/API/Agent can be both a client and a server. Clients and Servers are a logical seperation only, not a physical one. There is an natural idea of chaining/composability between clients and servers. Like a fire bucket chain of context slosh. Use a pydantic graph here as the engine for the orchestrator? I think the point is to create a co-pilot for the analyst that is using graphRAG to inform itself, given the users request, to think in graphRAG before determining how to navigate the MCP tools
-actually you aren't immediately writeing data to neo4j from the relational DB instead it's about doing graphRAG to curate the proper SQL statements and tool call to make... there maybe tools to do this.
-Use MCP inspector and also have an MCP server that automatically checks the logs in the inspector: https://modelcontextprotocol.io/docs/tools/inspector
-pydantic is key here and now worrying about frontend until demo time
 ### Components
-- **Neo4j**: Graph database storing workflows, instructions, and execution metadata
-- **MCP Server**: Single gateway for all Neo4j operations with parameter fixing
-- **Agent**: Executes instructions with configurable pause periods for human review
-- **PostgreSQL**: Sample data source for testing SQL generation
-- **Frontend**: React/Next.js chat interface with real-time workflow visualization
 ## Quick Start
 ### Prerequisites
 - Docker & Docker Compose
-- OpenAI or Anthropic API key (for LLM integration)
 ### Setup
 1. **Clone and configure**:
    ```bash
    git clone <repository-url>
    cd <repository-name>
-   cp .env.example .env
    ```
-2. **Add your LLM API key** to `.env`:
-   ```bash
-   # For OpenAI
-   LLM_API_KEY=sk-your-openai-key-here
-   LLM_MODEL=gpt-4
-   # For Anthropic
-   LLM_API_KEY=your-anthropic-key-here
-   LLM_MODEL=claude-3-sonnet-20240229
    ```
 3. **Start the system**:
@@ -76,235 +65,63 @@ pydantic is key here and now worrying about frontend until demo time
    make up
    ```
-4. **Seed demo data**:
    ```bash
-   make seed
    ```
 5. **Open the interface**:
-   - Frontend: http://localhost:3000
    - Neo4j Browser: http://localhost:7474 (neo4j/password)
 ## Usage
-### Basic Workflow
-1. **Ask a question** in the chat interface:
-   - "How many customers do we have?"
-   - "Show me all customers who have placed orders"
-   - "What's the total revenue?"
-2. **Watch the agent process**:
-   - Creates workflow with 3 instructions
-   - Discovers database schema
-   - Generates SQL from your question
-   - Reviews and formats results
-3. **Human intervention** (during 5-minute pauses):
-   - Edit instructions in Neo4j Browser
-   - Change parameters or questions
-   - Stop workflows if needed
-### Editing Instructions During Pause
-When the agent pauses, you can modify instructions in Neo4j Browser:
-```cypher
-// Change the question being asked
-MATCH (i:Instruction {status: 'pending'})
-SET i.parameters = '{"question": "Show me customers from the last month"}'
-// Stop the entire workflow
-MATCH (w:Workflow {status: 'active'})
-SET w.status = 'stopped'
-```
-### Monitoring
-Check system health:
-```bash
-make health
-```
-View real-time logs:
-```bash
-make logs
-```
-Check specific service:
-```bash
-make debug-agent    # Agent logs
-make debug-mcp      # MCP server logs
-make debug-frontend # Frontend logs
-```
-## Commands Reference
-| Command | Description |
-|---------|-------------|
-| `make up` | Start all services |
-| `make down` | Stop all services |
-| `make clean` | Remove all data and containers |
-| `make health` | Check service health |
-| `make seed` | Create demo data |
-| `make logs` | View all logs |
-| `make demo` | Full clean + setup + seed |
-| `make test` | Run integration test |
-## Configuration
-### Environment Variables
-All configuration is in `.env` file:
-- **Neo4j**: Database connection and auth
-- **PostgreSQL**: Sample data source
-- **MCP**: API keys and server settings
-- **Agent**: Polling interval and pause duration
-- **LLM**: API key and model selection
-### Pause Duration
-Default: 5 minutes (300 seconds)
-Configurable via `PAUSE_DURATION` in `.env`
-### Polling Interval
-Default: 30 seconds
-Configurable via `AGENT_POLL_INTERVAL` in `.env`
 ## Development
 ### File Structure
 ```
-├── agent/          # Python agent that executes instructions
-├── frontend/       # Next.js chat interface
-├── mcp/           # FastAPI server for Neo4j operations
-├── neo4j/         # Neo4j configuration and data
-├── postgres/      # PostgreSQL setup and sample data
-├── ops/           # Operational scripts (seeding, etc.)
 ├── docker-compose.yml
 ├── Makefile
 └── README.md
-```
-### Adding New Instruction Types
-1. **Define handler in agent**:
-   ```python
-   def handle_new_instruction_type(instruction):
-       # Implementation
-       return {"status": "success", "result": "..."}
-   ```
-2. **Add to agent main loop**:
-   ```python
-   elif instruction['type'] == 'new_instruction_type':
-       exec_result = handle_new_instruction_type(instruction)
-   ```
-3. **Update frontend** to create new instruction types in workflows.
-### Database Schema
-The system uses two databases:
-**Neo4j** (Workflow metadata):
-- `Workflow` nodes with status and metadata
-- `Instruction` nodes with type, sequence, parameters
-- `Execution` nodes with results and timestamps
-- Relationships: `HAS_INSTRUCTION`, `EXECUTED_AS`, `NEXT_INSTRUCTION`
-**PostgreSQL** (Sample data):
-- `customers` table
-- `orders` table
-- Sample data for testing SQL generation
-## Troubleshooting
-### Common Issues
-**Services not starting**:
-```bash
-make down
-make clean
-make up
-```
-**Agent not processing**:
-```bash
-make restart-agent
-make debug-agent
-```
-**Frontend not loading**:
-```bash
-make restart-frontend
-make debug-frontend
-```
-**Database connection issues**:
-```bash
-make health
-# Check .env configuration
-```
-### Debug Mode
-For detailed logging, check individual service logs:
-```bash
-docker-compose logs -f agent
-docker-compose logs -f mcp
-docker-compose logs -f frontend
-```
-### Reset Everything
-Complete clean slate:
-```bash
-make clean
-cp .env.example .env
-# Edit .env with your API key
-make demo
-```
-## Contributing
-1. Fork the repository
-2. Create a feature branch
-3. Test with `make demo`
-4. Submit a pull request
-## License
-MIT License - see LICENSE file for details.
----
-## Quick Demo
-Want to see it in action immediately?
-```bash
-# 1. Clone repo
-git clone <repo-url> && cd <repo-name>
-# 2. Add your API key
-cp .env.example .env
-# Edit .env: LLM_API_KEY=your-key-here
-# 3. Start everything
-make demo
-# 4. Open http://localhost:3000
-# 5. Ask: "How many records are in the database?"
-# 6. Watch the magic happen! ✨
-```
-The system will:
-- Create a workflow with 3 instructions
-- Pause for 5 minutes before each step (editable in Neo4j Browser)
-- Generate SQL from your natural language question
-- Execute the query and return formatted results
-- Show the entire process in a visual graph
-🎉 **Welcome to the future of human-AI collaboration!**

+# GraphRAG Agentic System
 ## Overview
+This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.
 ## Key Features
+🤖 **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution.
+🕸️ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.
+🔬 **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results.
+ conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent.
+🔌 **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server.
 ## Architecture
 ```
+┌─────────────────┐      ┌───────────────┐      ┌─────────────────┐
+│ Streamlit Chat  │──────│  Agent        │      │   MCP Server    │
+│      (UI)       │      │ (LangChain)   │      │    (FastAPI)    │
+└─────────────────┘      └───────────────┘      └─────────────────┘
+                                                       │
+                               ┌───────────────────────┼───────────────────────┐
+                               │                       │                       │
+                         ┌─────────────┐         ┌─────────────┐         ┌─────────────┐
+                         │   Neo4j     │         │ clinical_   │         │ laboratory  │
+                         │ (Schema KG) │         │ trials.db   │         │ .db         │
+                         └─────────────┘         └─────────────┘         └─────────────┘
+                                                       │
+                                                 ┌─────────────┐
+                                                 │ drug_       │
+                                                 │ discovery.db│
+                                                 └─────────────┘
+```
 ### Components
+- **Streamlit**: Provides a conversational chat interface for users to ask questions.
+- **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
+- **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
+- **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases.
+- **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.
 ## Quick Start
 ### Prerequisites
 - Docker & Docker Compose
+- OpenAI API key
 ### Setup
 1. **Clone and configure**:
    ```bash
    git clone <repository-url>
    cd <repository-name>
+   touch .env
    ```
+2. **Add your OpenAI API key** to the `.env` file. This is the only secret you need to provide.
+   ```
+   OPENAI_API_KEY="sk-your-openai-key-here"
    ```
 3. **Start the system**:
    make up
    ```
+4. **Seed the databases and ingest schema**:
    ```bash
+   make seed-db
+   make ingest
    ```
 5. **Open the interface**:
+   - Streamlit UI: http://localhost:8501
    - Neo4j Browser: http://localhost:7474 (neo4j/password)
 ## Usage
+Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
+- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
+- "Find all drugs with 'Aspirin' in their name."
+- "Show me lab results for patient '123'."
+The agent will then:
+1. Use the `SchemaSearchTool` to find relevant tables.
+2. Use the `JoinPathFinderTool` to determine how to join them.
+3. Construct a SQL query.
+4. Execute the query using the `QueryExecutorTool`.
+5. Return the final answer to the UI.
 ## Development
+### Running the Agent Manually
+To test the agent's logic directly without the full Docker stack, you can run it from your terminal.
+1.  **Set up the environment**:
+    Make sure the MCP and Neo4j services are running (`make up`).
+    Create a Python virtual environment and install dependencies:
+    ```bash
+    python -m venv venv
+    source venv/bin/activate
+    pip install -r agent/requirements.txt
+    ```
+2.  **Set your API key**:
+    ```bash
+    export OPENAI_API_KEY="sk-your-openai-key-here"
+    ```
+3.  **Run the agent**:
+    ```bash
+    python agent/main.py
+    ```
+    The agent will run with the hardcoded example question and print the execution trace and final answer to your console.
 ### File Structure
 ```
+├── agent/          # The LangChain agent and its tools
+├── streamlit/      # The Streamlit conversational UI
+├── mcp/            # FastAPI server with core logic
+├── neo4j/          # Neo4j configuration and data
+├── data/           # SQLite databases
+├── ops/            # Operational scripts (seeding, ingestion, etc.)
 ├── docker-compose.yml
 ├── Makefile
 └── README.md
+```

agent/main.py CHANGED Viewed

@@ -1,435 +1,134 @@
 import os
-import time
-import json
-import requests
-import signal
 import sys
-from datetime import datetime
-import openai
-from anthropic import Anthropic
-MCP_URL = os.getenv("MCP_URL", "http://mcp:8000/mcp")
-API_KEY = os.getenv("MCP_API_KEY", "dev-key-123")
-POLL_INTERVAL = int(os.getenv("AGENT_POLL_INTERVAL", "30"))
-# Configure LLM
-LLM_MODEL = os.getenv("LLM_MODEL", "gpt-4")
-LLM_API_KEY = os.getenv("LLM_API_KEY")
-if "gpt" in LLM_MODEL:
-    openai.api_key = LLM_API_KEY
-    llm_client = None
-else:
-    llm_client = Anthropic(api_key=LLM_API_KEY)
-# Defining the agents
-## Data Procurement Agent
-## Graph Analysis Agent
-## x Agent etc...
-# Create Orchestrato with all the agents and their tasks per orchestrator.py file
-orchestrator = Orchestrator(
-    llm_factory=EastridgeAugmentedLLM,
-    available_agents=[
-        DataProcurementAgent(),
-        GraphAnalysisAgent(),
-        xAgent()
-    ],
-    plan_type="full",
-    plan_output_path=Path("output/execution_plan.md"),
-)
-# Global flag for interrupt handling
-interrupted = False
-def signal_handler(sig, frame):
-    global interrupted
-    interrupted = True
-    print(f"\n[{datetime.now()}] Interrupt received, will stop after current instruction")
-signal.signal(signal.SIGINT, signal_handler)
-signal.signal(signal.SIGTERM, signal_handler)
-def call_mcp(tool, params=None):
-    response = requests.post(
-        MCP_URL,
-        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
-        json={"tool": tool, "params": params or {}}
-    )
-    return response.json()
-def get_llm_response(prompt):
-    """Get response from configured LLM"""
-    if "gpt" in LLM_MODEL:
-        response = openai.ChatCompletion.create(
-            model=LLM_MODEL,
-            messages=[
-                {"role": "system", "content": "You are a SQL expert. Generate only valid PostgreSQL queries."},
-                {"role": "user", "content": prompt}
-            ],
-            temperature=0,
-            max_tokens=500
-        )
-        return response.choices[0].message.content
-    else:
-        response = llm_client.messages.create(
-            model=LLM_MODEL,
-            max_tokens=500,
-            temperature=0,
-            messages=[{"role": "user", "content": prompt}]
-        )
-        return response.content[0].text
-def handle_discover_schema(instruction):
-    """Discover PostgreSQL schema and store in Neo4j"""
-    print(f"[{datetime.now()}] Discovering PostgreSQL schema...")
-    # Call MCP to discover schema
-    schema_result = call_mcp("discover_postgres_schema")
-    if "error" in schema_result:
-        return {"status": "failed", "error": schema_result["error"]}
-    schema = schema_result["schema"]
-    # Create SourceSystem node
-    call_mcp("write_graph", {
-        "action": "create_node",
-        "label": "SourceSystem",
-        "properties": {
-            "id": "postgres-main",
-            "name": "Main PostgreSQL Database",
-            "type": "postgresql",
-            "discovered_at": datetime.now().isoformat()
-        }
-    })
-    # For each table, create nodes
-    for table_name, columns in schema.items():
-        # Create Table node
-        table_result = call_mcp("write_graph", {
-            "action": "create_node",
-            "label": "Table",
-            "properties": {
-                "name": table_name,
-                "schema": "public",
-                "column_count": len(columns)
-            }
-        })
-        # Link Table to SourceSystem
-        call_mcp("query_graph", {
-            "query": """
-                MATCH (s:SourceSystem {id: 'postgres-main'}),
-                      (t:Table {name: $table_name})
-                MERGE (s)-[:HAS_TABLE]->(t)
-            """,
-            "parameters": {"table_name": table_name}
-        })
-        # Create Column nodes
-        for col in columns:
-            col_result = call_mcp("write_graph", {
-                "action": "create_node",
-                "label": "Column",
-                "properties": {
-                    "name": col['column_name'],
-                    "data_type": col['data_type'],
-                    "nullable": col['is_nullable'] == 'YES',
-                    "table_name": table_name
-                }
-            })
-            # Link Column to Table
-            call_mcp("query_graph", {
-                "query": """
-                    MATCH (t:Table {name: $table_name}),
-                          (c:Column {name: $col_name, table_name: $table_name})
-                    MERGE (t)-[:HAS_COLUMN]->(c)
-                """,
-                "parameters": {
-                    "table_name": table_name,
-                    "col_name": col['column_name']
-                }
-            })
-    # Generate sample queries
-    for table_name in schema.keys():
-        sample_queries = [
-            f"SELECT * FROM {table_name} LIMIT 10",
-            f"SELECT COUNT(*) FROM {table_name}",
-            f"SELECT * FROM {table_name} WHERE id = 1"
         ]
-        for idx, query in enumerate(sample_queries):
-            call_mcp("write_graph", {
-                "action": "create_node",
-                "label": "QueryTemplate",
-                "properties": {
-                    "id": f"template-{table_name}-{idx}",
-                    "table_name": table_name,
-                    "query": query,
-                    "description": f"Sample query {idx+1} for {table_name}"
-                }
-            })
-    return {
-        "status": "success",
-        "tables_discovered": len(schema),
-        "columns_discovered": sum(len(cols) for cols in schema.values())
-    }
-def handle_generate_sql(instruction):
-    """Generate SQL from natural language using LLM"""
-    print(f"[{datetime.now()}] Generating SQL from natural language...")
-    # Get the user question from instruction parameters
-    params = json.loads(instruction.get('parameters', '{}'))
-    user_question = params.get('question', 'Show all data')
-    # Fetch schema from Neo4j
-    schema_result = call_mcp("query_graph", {
-        "query": """
-            MATCH (t:Table)-[:HAS_COLUMN]->(c:Column)
-            RETURN t.name as table_name,
-                   collect({
-                       name: c.name,
-                       type: c.data_type,
-                       nullable: c.nullable
-                   }) as columns
-        """
-    })
-    # Format schema for LLM
-    schema_text = "PostgreSQL Schema:\n"
-    for record in schema_result['data']:
-        table = record['table_name']
-        columns = record['columns']
-        schema_text += f"\nTable: {table}\n"
-        for col in columns:
-            nullable = "NULL" if col['nullable'] else "NOT NULL"
-            schema_text += f"  - {col['name']}: {col['type']} {nullable}\n"
-    # Create prompt
-    prompt = f"""Given this PostgreSQL schema:
-{schema_text}
-Generate a SQL query for this question: {user_question}
-Return ONLY the SQL query, no explanations or markdown."""
-    try:
-        # Get SQL from LLM
-        generated_sql = get_llm_response(prompt)
-        # Clean up the SQL (remove markdown if present)
-        generated_sql = generated_sql.strip()
-        if generated_sql.startswith("```"):
-            generated_sql = generated_sql.split("```")[1]
-            if generated_sql.startswith("sql"):
-                generated_sql = generated_sql[3:]
-        generated_sql = generated_sql.strip()
-        print(f"[{datetime.now()}] Generated SQL: {generated_sql}")
-        # Execute the SQL
-        query_result = call_mcp("query_postgres", {"query": generated_sql})
-        if "error" in query_result:
-            return {
-                "status": "failed",
-                "generated_sql": generated_sql,
-                "error": query_result["error"]
-            }
-        # Store successful query as template
-        call_mcp("write_graph", {
-            "action": "create_node",
-            "label": "QueryTemplate",
-            "properties": {
-                "id": f"generated-{int(time.time())}",
-                "query": generated_sql,
-                "question": user_question,
-                "created_at": datetime.now().isoformat()
-            }
-        })
-        return {
-            "status": "success",
-            "generated_sql": generated_sql,
-            "row_count": query_result.get("row_count", 0),
-            "data": query_result.get("data", [])[:10]  # Limit to 10 rows for storage
-        }
-    except Exception as e:
-        return {
-            "status": "failed",
-            "error": str(e)
-        }
-def check_workflow_stop(workflow_id):
-    """Check if workflow has been marked to stop"""
-    result = call_mcp("query_graph", {
-        "query": "MATCH (w:Workflow {id: $id}) RETURN w.status as status",
-        "parameters": {"id": workflow_id}
-    })
-    if result['data'] and result['data'][0]['status'] == 'stopped':
-        return True
-    return False
-def pause_with_interrupt(duration, instruction_id, workflow_id=None):
-    """Pause for duration seconds with interrupt checking every 10 seconds"""
-    print(f"[{datetime.now()}] Pausing for {duration} seconds for human review")
-    print(f"[{datetime.now()}] You can edit instruction in Neo4j Browser:")
-    print(f"  MATCH (i:Instruction {{id: '{instruction_id}'}}) SET i.parameters = '{{\"key\": \"value\"}}'")
-    # Log pause start
-    call_mcp("write_graph", {
-        "action": "create_node",
-        "label": "Log",
-        "properties": {
-            "type": "pause_started",
-            "instruction_id": instruction_id,
-            "duration": duration,
-            "timestamp": datetime.now().isoformat()
-        }
-    })
-    elapsed = 0
-    while elapsed < duration:
-        # Check every 10 seconds
-        sleep_time = min(10, duration - elapsed)
-        time.sleep(sleep_time)
-        elapsed += sleep_time
-        # Check for workflow stop
-        if workflow_id and check_workflow_stop(workflow_id):
-            print(f"[{datetime.now()}] Workflow stopped during pause")
-            return False
-        # Check for global interrupt
-        if interrupted:
-            print(f"[{datetime.now()}] Interrupted during pause")
-            return False
-        # Show progress
-        remaining = duration - elapsed
-        if remaining > 0 and elapsed % 30 == 0:  # Update every 30 seconds
-            print(f"[{datetime.now()}] Pause remaining: {remaining} seconds")
-    # Log pause end
-    call_mcp("write_graph", {
-        "action": "create_node",
-        "label": "Log",
-        "properties": {
-            "type": "pause_completed",
-            "instruction_id": instruction_id,
-            "timestamp": datetime.now().isoformat()
-        }
-    })
-    print(f"[{datetime.now()}] Pause complete, continuing execution")
-    return True
 def main():
-    global interrupted
-    print(f"[{datetime.now()}] Agent starting, polling every {POLL_INTERVAL}s")
-    while not interrupted:
-        try:
-            result = call_mcp("get_next_instruction")
-            instruction = result.get("instruction")
-            if instruction:
-                print(f"[{datetime.now()}] Found instruction: {instruction['id']}, type: {instruction['type']}")
-                # Get workflow ID
-                workflow_result = call_mcp("query_graph", {
-                    "query": """
-                        MATCH (w:Workflow)-[:HAS_INSTRUCTION]->(i:Instruction {id: $id})
-                        RETURN w.id as workflow_id
-                    """,
-                    "parameters": {"id": instruction['id']}
-                })
-                workflow_id = workflow_result['data'][0]['workflow_id'] if workflow_result['data'] else None
-                # PAUSE BEFORE EXECUTION
-                pause_duration = instruction.get('pause_duration', 300)
-                if pause_duration > 0:
-                    if not pause_with_interrupt(pause_duration, instruction['id'], workflow_id):
-                        print(f"[{datetime.now()}] Execution cancelled during pause")
-                        continue
-                # Re-fetch instruction to get any edits made during pause
-                refetch_result = call_mcp("query_graph", {
-                    "query": "MATCH (i:Instruction {id: $id}) RETURN i",
-                    "parameters": {"id": instruction['id']}
-                })
-                if refetch_result['data']:
-                    instruction = refetch_result['data'][0]['i']
-                    print(f"[{datetime.now()}] Re-fetched instruction after pause, parameters: {instruction.get('parameters')}")
-                # Update status to executing
-                call_mcp("query_graph", {
-                    "query": "MATCH (i:Instruction {id: $id}) SET i.status = 'executing'",
-                    "parameters": {"id": instruction['id']}
-                })
-                # Execute based on type
-                if instruction['type'] == 'discover_schema':
-                    exec_result = handle_discover_schema(instruction)
-                elif instruction['type'] == 'generate_sql':
-                    exec_result = handle_generate_sql(instruction)
-                else:
-                    exec_result = {"status": "success", "result": "Reviewed"}
-                # Store execution result
-                exec_node = call_mcp("write_graph", {
-                    "action": "create_node",
-                    "label": "Execution",
-                    "properties": {
-                        "id": f"exec-{instruction['id']}-{int(time.time())}",
-                        "started_at": datetime.now().isoformat(),
-                        "completed_at": datetime.now().isoformat(),
-                        "result": json.dumps(exec_result)
-                    }
-                })
-                # Link execution
-                call_mcp("query_graph", {
-                    "query": """
-                        MATCH (i:Instruction {id: $iid}), (e:Execution {id: $eid})
-                        CREATE (i)-[:EXECUTED_AS]->(e)
-                    """,
-                    "parameters": {
-                        "iid": instruction['id'],
-                        "eid": exec_node['created']['id']
-                    }
-                })
-                # Update status
-                final_status = 'complete' if exec_result.get('status') == 'success' else 'failed'
-                call_mcp("query_graph", {
-                    "query": "MATCH (i:Instruction {id: $id}) SET i.status = $status",
-                    "parameters": {"id": instruction['id'], "status": final_status}
-                })
-                print(f"[{datetime.now()}] Completed instruction: {instruction['id']}")
-            else:
-                print(f"[{datetime.now()}] No pending instructions")
-            time.sleep(POLL_INTERVAL)
-        except Exception as e:
-            print(f"[{datetime.now()}] Error: {e}")
-            time.sleep(POLL_INTERVAL)
-    print(f"[{datetime.now()}] Agent shutting down")
 if __name__ == "__main__":
     main()

 import os
 import sys
+import logging
+import json
+from typing import Annotated, List, TypedDict
+from fastapi import FastAPI
+from pydantic import BaseModel
+import uvicorn
+from fastapi.responses import StreamingResponse
+from langchain_core.messages import BaseMessage, ToolMessage, AIMessage
+from langchain_openai import OpenAI
+from langgraph.graph import StateGraph, START, END
+from langgraph.prebuilt import ToolNode
+from agent.tools import MCPClient, SchemaSearchTool, JoinPathFinderTool, QueryExecutorTool
+# --- Configuration & Logging ---
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+MCP_URL = os.getenv("MCP_URL", "http://mcp:8000/mcp")
+API_KEY = os.getenv("MCP_API_KEY", "dev-key-123")
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+# --- Agent State Definition ---
+class AgentState(TypedDict):
+    messages: List[BaseMessage]
+# --- Agent Initialization ---
+class GraphRAGAgent:
+    """The core agent for handling GraphRAG queries using LangGraph."""
+    def __init__(self):
+        if not OPENAI_API_KEY:
+            raise ValueError("OPENAI_API_KEY environment variable not set.")
+        llm = OpenAI(api_key=OPENAI_API_KEY, temperature=0, max_retries=1)
+        mcp_client = MCPClient(mcp_url=MCP_URL, api_key=API_KEY)
+        tools = [
+            SchemaSearchTool(mcp_client=mcp_client),
+            JoinPathFinderTool(mcp_client=mcp_client),
+            QueryExecutorTool(mcp_client=mcp_client),
         ]
+        self.llm_with_tools = llm.bind_tools(tools)
+        self.tool_node = ToolNode(tools)
+        # Define the agent graph
+        graph = StateGraph(AgentState)
+        graph.add_node("llm", self.call_llm)
+        graph.add_node("tools", self.tool_node)
+        graph.add_edge(START, "llm")
+        graph.add_conditional_edges("llm", self.should_call_tools)
+        graph.add_edge("tools", "llm")
+        self.graph = graph.compile()
+    def should_call_tools(self, state: AgentState) -> str:
+        """Determines whether to call tools or end the execution."""
+        last_message = state["messages"][-1]
+        if not last_message.tool_calls:
+            return END
+        return "tools"
+    def call_llm(self, state: AgentState) -> dict:
+        """Calls the LLM with the current state to decide the next action."""
+        response = self.llm_with_tools.invoke(state["messages"])
+        return {"messages": [response]}
+    async def stream_query(self, question: str):
+        """Processes a question and streams the intermediate steps."""
+        inputs = {"messages": [("user", question)]}
+        async for event in self.graph.astream(inputs, stream_mode="values"):
+            last_message = event["messages"][-1]
+            if isinstance(last_message, AIMessage) and last_message.tool_calls:
+                # Agent is thinking and calling a tool
+                tool_call = last_message.tool_calls[0]
+                yield json.dumps({
+                    "type": "thought",
+                    "content": f"🤖 Calling tool `{tool_call['name']}` with args: {tool_call['args']}"
+                }) + "\\n\\n"
+            elif isinstance(last_message, ToolMessage):
+                # A tool has returned its result
+                 yield json.dumps({
+                     "type": "observation",
+                     "content": f"🛠️ Tool `{last_message.name}` returned:\n\n```\n{last_message.content}\n```"
+                 }) + "\\n\\n"
+            elif isinstance(last_message, AIMessage):
+                # This is the final answer
+                yield json.dumps({"type": "final_answer", "content": last_message.content}) + "\\n\\n"
+# --- FastAPI Application ---
+app = FastAPI(title="GraphRAG Agent Server")
+agent = None
+class QueryRequest(BaseModel):
+    question: str
+@app.on_event("startup")
+def startup_event():
+    """Initialize the agent on server startup."""
+    global agent
+    try:
+        agent = GraphRAGAgent()
+        logger.info("GraphRAGAgent initialized successfully.")
+    except ValueError as e:
+        logger.error(f"Agent initialization failed: {e}")
+@app.post("/query")
+async def execute_query(request: QueryRequest) -> StreamingResponse:
+    """Endpoint to receive questions and stream the agent's response."""
+    if not agent:
+        async def error_stream():
+            yield json.dumps({"error": "Agent is not initialized. Check server logs."})
+        return StreamingResponse(error_stream())
+    return StreamingResponse(agent.stream_query(request.question), media_type="application/x-ndjson")
+@app.get("/health")
+def health_check():
+    """Health check endpoint."""
+    return {"status": "ok", "agent_initialized": agent is not None}
+# --- Main Execution ---
 def main():
+    """Main entry point to run the FastAPI server."""
+    logger.info("Starting agent server...")
+    uvicorn.run(app, host="0.0.0.0", port=8001)
 if __name__ == "__main__":
     main()

agent/requirements.txt CHANGED Viewed

@@ -1,4 +1,8 @@
-requests==2.31.0
-python-dotenv==1.0.0
-openai==0.28.1
-anthropic==0.7.0

+requests
+python-dotenv
+langchain
+langchain-openai
+pydantic
+fastapi
+uvicorn[standard]
+langgraph

agent/tools.py ADDED Viewed

	@@ -0,0 +1,150 @@

+import os
+import requests
+import json
+from typing import Dict, Any, List, Optional
+from langchain.tools import BaseTool
+from pydantic import Field
+import logging
+logger = logging.getLogger(__name__)
+class MCPClient:
+    """Client for making authenticated REST API calls to the MCP server."""
+    def __init__(self, mcp_url: str, api_key: str):
+        self.mcp_url = mcp_url
+        self.headers = {
+            "Authorization": f"Bearer {api_key}",
+            "Content-Type": "application/json"
+        }
+    def post(self, endpoint: str, data: Dict[str, Any]) -> Dict[str, Any]:
+        """Send a POST request to a given MCP endpoint."""
+        try:
+            url = f"{self.mcp_url}/{endpoint}"
+            response = requests.post(url, headers=self.headers, data=json.dumps(data))
+            response.raise_for_status()
+            return response.json()
+        except requests.exceptions.HTTPError as http_err:
+            logger.error(f"HTTP error occurred: {http_err} - {response.text}")
+            return {"status": "error", "message": f"HTTP error: {response.status_code} {response.reason}"}
+        except requests.exceptions.RequestException as req_err:
+            logger.error(f"Request error occurred: {req_err}")
+            return {"status": "error", "message": f"Request failed: {req_err}"}
+        except json.JSONDecodeError:
+            logger.error("Failed to decode JSON response.")
+            return {"status": "error", "message": "Invalid JSON response from server."}
+class SchemaSearchTool(BaseTool):
+    """LangChain tool for searching database schemas."""
+    name: str = "schema_search"
+    description: str = """
+    Search for relevant database schemas based on a natural language query.
+    Use this when you need to find which tables/columns are relevant to a user's question.
+    Input should be a descriptive query like 'patient information' or 'drug trials'.
+    """
+    mcp_client: MCPClient
+    def _run(self, query: str) -> str:
+        """Execute schema search."""
+        response = self.mcp_client.post("discovery/get_relevant_schemas", {"query": query})
+        if response.get("status") == "success":
+            schemas = response.get("schemas", [])
+            if schemas:
+                schema_text = "Found relevant schemas:\\n"
+                for schema in schemas:
+                    schema_text += f"- {schema.get('database', 'Unknown')}.{schema.get('table', 'Unknown')}.{schema.get('name', 'Unknown')} ({schema.get('type', ['Unknown'])[0]})\\n"
+                return schema_text
+            else:
+                return "No relevant schemas found."
+        else:
+            return f"Error searching schemas: {response.get('message', 'Unknown error')}"
+    async def _arun(self, query: str) -> str:
+        raise NotImplementedError("SchemaSearchTool does not support async")
+class JoinPathFinderTool(BaseTool):
+    """LangChain tool for finding join paths between tables."""
+    name: str = "find_join_path"
+    description: str = """
+    Find how to join two tables together using foreign key relationships.
+    Use this when you need to query across multiple tables.
+    Input should be two table names separated by a comma, like 'patients,studies'.
+    """
+    mcp_client: MCPClient
+    def _run(self, table_names: str) -> str:
+        """Find join path."""
+        try:
+            tables = [t.strip() for t in table_names.split(',')]
+            if len(tables) != 2:
+                return "Please provide exactly two table names separated by a comma."
+            response = self.mcp_client.post(
+                "graph/find_join_path",
+                {"table1": tables[0], "table2": tables[1]}
+            )
+            if response.get("status") == "success":
+                path = response.get("path", "No path found")
+                return f"Join path: {path}"
+            else:
+                return f"Error finding join path: {response.get('message', 'Unknown error')}"
+        except Exception as e:
+            return f"Failed to find join path: {str(e)}"
+    async def _arun(self, table_names: str) -> str:
+        raise NotImplementedError("JoinPathFinderTool does not support async")
+class QueryExecutorTool(BaseTool):
+    """LangChain tool for executing SQL queries."""
+    name: str = "execute_query"
+    description: str = """
+    Execute a SQL query against the databases and return results.
+    Use this after you have a valid SQL query.
+    Input should be a valid SQL query string.
+    """
+    mcp_client: MCPClient
+    def _run(self, sql: str) -> str:
+        """Execute query."""
+        try:
+            response = self.mcp_client.post(
+                "intelligence/execute_query",
+                {"sql": sql}
+            )
+            if response.get("status") == "success":
+                results = response.get("results", [])
+                if results:
+                    # Format results as a readable table
+                    result_text = f"Query returned {len(results)} rows:\\n"
+                    headers = list(results[0].keys())
+                    result_text += " | ".join(headers) + "\\n"
+                    result_text += "-" * (len(" | ".join(headers))) + "\\n"
+                    for row in results[:10]:  # Limit display to first 10 rows
+                        values = [str(row.get(h, "")) for h in headers]
+                        result_text += " | ".join(values) + "\\n"
+                    if len(results) > 10:
+                        result_text += f"... and {len(results) - 10} more rows\\n"
+                    return result_text
+                else:
+                    return "Query executed successfully but returned no results."
+            else:
+                return f"Error executing query: {response.get('message', 'Unknown error')}"
+        except Exception as e:
+            return f"Failed to execute query: {str(e)}"
+    async def _arun(self, sql: str) -> str:
+        raise NotImplementedError("QueryExecutorTool does not support async")

docker-compose.yml CHANGED Viewed

@@ -19,25 +19,6 @@ services:
     networks:
       - agent-network
-  postgres:
-    image: postgres:15
-    environment:
-      - POSTGRES_DB=testdb
-      - POSTGRES_USER=postgres
-      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
-    ports:
-      - "5432:5432"
-    volumes:
-      - ./postgres/data:/var/lib/postgresql/data
-      - ./postgres/init.sql:/docker-entrypoint-initdb.d/init.sql
-    healthcheck:
-      test: ["CMD-SHELL", "pg_isready -U postgres"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-    networks:
-      - agent-network
   mcp:
     build: ./mcp
     ports:
@@ -45,17 +26,15 @@ services:
     environment:
       - NEO4J_BOLT_URL=${NEO4J_BOLT_URL}
       - NEO4J_AUTH=${NEO4J_AUTH}
-      - POSTGRES_CONNECTION=${POSTGRES_CONNECTION}
       - MCP_API_KEYS=${MCP_API_KEYS}
       - MCP_PORT=${MCP_PORT}
     depends_on:
       neo4j:
         condition: service_healthy
-      postgres:
-        condition: service_healthy
     volumes:
       - ./mcp:/app
       - ./ops/scripts:/app/ops/scripts
     command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
     healthcheck:
       test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
@@ -71,33 +50,17 @@ services:
       - MCP_URL=http://mcp:8000/mcp
       - MCP_API_KEY=dev-key-123
       - AGENT_POLL_INTERVAL=${AGENT_POLL_INTERVAL}
-      - PAUSE_DURATION=${PAUSE_DURATION}
-      - LLM_API_KEY=${LLM_API_KEY}
-      - LLM_MODEL=${LLM_MODEL}
-      - POSTGRES_CONNECTION=${POSTGRES_CONNECTION}
     depends_on:
       mcp:
         condition: service_healthy
     volumes:
       - ./agent:/app
     command: python -u main.py
-    restart: unless-stopped
-    networks:
-      - agent-network
-  frontend:
-    build: ./frontend
     ports:
-      - "3000:3000"
-    environment:
-      - NEXT_PUBLIC_MCP_URL=http://localhost:8000
-    depends_on:
-      - mcp
-    volumes:
-      - ./frontend:/app
-      - /app/node_modules
-      - /app/.next
-    command: npm run dev
     networks:
       - agent-network
@@ -106,6 +69,7 @@ services:
     ports:
       - "8501:8501"
     environment:
       - MCP_URL=http://mcp:8000/mcp
       - MCP_API_KEY=dev-key-123
     depends_on:

     networks:
       - agent-network
   mcp:
     build: ./mcp
     ports:
     environment:
       - NEO4J_BOLT_URL=${NEO4J_BOLT_URL}
       - NEO4J_AUTH=${NEO4J_AUTH}
       - MCP_API_KEYS=${MCP_API_KEYS}
       - MCP_PORT=${MCP_PORT}
     depends_on:
       neo4j:
         condition: service_healthy
     volumes:
       - ./mcp:/app
       - ./ops/scripts:/app/ops/scripts
+      - ./data:/app/data
     command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
     healthcheck:
       test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
       - MCP_URL=http://mcp:8000/mcp
       - MCP_API_KEY=dev-key-123
       - AGENT_POLL_INTERVAL=${AGENT_POLL_INTERVAL}
+      - OPENAI_API_KEY=${OPENAI_API_KEY}
     depends_on:
       mcp:
         condition: service_healthy
     volumes:
       - ./agent:/app
+      - ./data:/app/data
     command: python -u main.py
+    restart: on-failure
     ports:
+      - "8001:8001"
     networks:
       - agent-network
     ports:
       - "8501:8501"
     environment:
+      - AGENT_URL=http://agent:8001/query
       - MCP_URL=http://mcp:8000/mcp
       - MCP_API_KEY=dev-key-123
     depends_on:

mcp/__init__.py ADDED Viewed

File without changes

mcp/core/__init__.py ADDED Viewed

File without changes

mcp/core/config.py ADDED Viewed

	@@ -0,0 +1,24 @@

+import os
+# --- Neo4j Configuration ---
+NEO4J_URI = os.getenv("NEO4J_BOLT_URL", "bolt://neo4j:7687")
+NEO4J_USER = os.getenv("NEO4J_USER", "neo4j")
+NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD", "password")
+# --- SQLite Configuration ---
+SQLITE_DATA_DIR = os.getenv("SQLITE_DATA_DIR", "/app/data")
+def get_sqlite_connection_string(db_name: str) -> str:
+    """
+    Generates the SQLAlchemy connection string for a given SQLite database file.
+    Assumes the database file is located in the SQLITE_DATA_DIR.
+    Example: get_sqlite_connection_string("clinical_trials.db")
+             -> "sqlite:////app/data/clinical_trials.db"
+    """
+    db_path = os.path.join(SQLITE_DATA_DIR, db_name)
+    return f"sqlite:///{db_path}"
+# --- Application Settings ---
+# You can add other application-wide settings here
+# For example, API keys, logging levels, etc.
+# These would typically be loaded from environment variables as well.

mcp/core/database.py ADDED Viewed

	@@ -0,0 +1,26 @@

+from sqlalchemy import create_engine
+from sqlalchemy.engine import Engine
+import logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def get_db_engine(connection_string: str) -> Engine | None:
+    """
+    Creates a SQLAlchemy engine for a given database connection string.
+    Args:
+        connection_string: The database connection string.
+    Returns:
+        A SQLAlchemy Engine instance, or None if connection fails.
+    """
+    try:
+        engine = create_engine(connection_string)
+        # Test the connection
+        with engine.connect() as connection:
+            logger.info(f"Successfully connected to {engine.url.database}")
+        return engine
+    except Exception as e:
+        logger.error(f"Failed to connect to database: {e}")
+        return None

mcp/core/discovery.py ADDED Viewed

	@@ -0,0 +1,98 @@

+from sqlalchemy import inspect, text
+from sqlalchemy.engine import Engine
+from typing import Dict, Any, List
+import logging
+import json
+from concurrent.futures import TimeoutError, ThreadPoolExecutor
+logger = logging.getLogger(__name__)
+def get_table_schema(inspector, table_name: str) -> Dict[str, Any]:
+    """Extracts schema for a single table."""
+    columns = inspector.get_columns(table_name)
+    primary_keys = inspector.get_pk_constraint(table_name)['constrained_columns']
+    foreign_keys = inspector.get_foreign_keys(table_name)
+    table_schema = {
+        "name": table_name,
+        "columns": [],
+        "primary_keys": primary_keys,
+        "foreign_keys": foreign_keys
+    }
+    for col in columns:
+        table_schema["columns"].append({
+            "name": col['name'],
+            "type": str(col['type']),
+            "nullable": col['nullable'],
+            "default": col.get('default'),
+        })
+    return table_schema
+def get_sample_data(engine: Engine, table_name: str, sample_size: int = 5) -> Dict[str, Any]:
+    """Fetches sample data and distinct values for each column."""
+    sample_data = {}
+    with engine.connect() as connection:
+        # Get row count
+        try:
+            result = connection.execute(text(f'SELECT COUNT(*) FROM "{table_name}"'))
+            sample_data['row_count'] = result.scalar_one()
+        except Exception as e:
+            logger.warning(f"Could not get row count for table {table_name}: {e}")
+            sample_data['row_count'] = -1 # Indicate error or unknown
+        # Get sample rows
+        try:
+            result = connection.execute(text(f'SELECT * FROM "{table_name}" LIMIT {sample_size}'))
+            rows = [dict(row._mapping) for row in result.fetchall()]
+            # Attempt to JSON serialize to handle complex types gracefully
+            sample_data['sample_rows'] = json.loads(json.dumps(rows, default=str))
+        except Exception as e:
+            logger.warning(f"Could not get sample rows for table {table_name}: {e}")
+            sample_data['sample_rows'] = []
+    return sample_data
+def discover_schema(engine: Engine, timeout: int = 30) -> Dict[str, Any] | None:
+    """
+    Discovers the full schema of a database using SQLAlchemy's inspection API.
+    Includes table schemas and sample data.
+    """
+    try:
+        with ThreadPoolExecutor() as executor:
+            future = executor.submit(_discover_schema_task, engine)
+            return future.result(timeout=timeout)
+    except TimeoutError:
+        logger.error(f"Schema discovery for {engine.url.database} timed out after {timeout} seconds.")
+        return None
+    except Exception as e:
+        logger.error(f"An unexpected error occurred during schema discovery for {engine.url.database}: {e}")
+        return None
+def _discover_schema_task(engine: Engine) -> Dict[str, Any]:
+    """The actual schema discovery logic to be run with a timeout."""
+    inspector = inspect(engine)
+    db_schema = {
+        "database_name": engine.url.database,
+        "dialect": engine.dialect.name,
+        "tables": []
+    }
+    table_names = inspector.get_table_names()
+    for table_name in table_names:
+        try:
+            logger.info(f"Discovering schema for table: {table_name}")
+            table_schema = get_table_schema(inspector, table_name)
+            logger.info(f"Collecting sample data for table: {table_name}")
+            sample_info = get_sample_data(engine, table_name)
+            table_schema.update(sample_info)
+            db_schema["tables"].append(table_schema)
+        except Exception as e:
+            logger.error(f"Could not inspect table '{table_name}': {e}")
+            continue
+    return db_schema

mcp/core/graph.py ADDED Viewed

	@@ -0,0 +1,151 @@

+from neo4j import GraphDatabase
+import logging
+import json
+from typing import List, Dict, Any
+from . import config
+logger = logging.getLogger(__name__)
+class GraphStore:
+    def __init__(self):
+        self._driver = GraphDatabase.driver(config.NEO4J_URI, auth=(config.NEO4J_USER, config.NEO4J_PASSWORD))
+        self.ensure_constraints()
+    def close(self):
+        self._driver.close()
+    def ensure_constraints(self):
+        """Ensure uniqueness constraints are set up in Neo4j."""
+        with self._driver.session() as session:
+            session.run("CREATE CONSTRAINT IF NOT EXISTS FOR (d:Database) REQUIRE d.name IS UNIQUE")
+            session.run("CREATE CONSTRAINT IF NOT EXISTS FOR (t:Table) REQUIRE t.unique_name IS UNIQUE")
+            session.run("CREATE CONSTRAINT IF NOT EXISTS FOR (c:Column) REQUIRE c.unique_name IS UNIQUE")
+        logger.info("Neo4j constraints ensured.")
+    def import_schema(self, schema_data: dict):
+        """
+        Imports a discovered database schema into the Neo4j graph.
+        """
+        db_name = schema_data['database_name']
+        with self._driver.session() as session:
+            # Create Database node
+            session.run("MERGE (d:Database {name: $db_name})", db_name=db_name)
+            for table in schema_data['tables']:
+                table_unique_name = f"{db_name}.{table['name']}"
+                table_properties = {
+                    "name": table['name'],
+                    "unique_name": table_unique_name,
+                    "row_count": table.get('row_count', -1),
+                    "sample_rows": json.dumps(table.get('sample_rows', []))
+                }
+                # Create Table node and HAS_TABLE relationship
+                session.run(
+                    """
+                    MATCH (d:Database {name: $db_name})
+                    MERGE (t:Table {unique_name: $unique_name})
+                    ON CREATE SET t += $props
+                    ON MATCH SET t += $props
+                    MERGE (d)-[:HAS_TABLE]->(t)
+                    """,
+                    db_name=db_name,
+                    unique_name=table_unique_name,
+                    props=table_properties
+                )
+                for column in table['columns']:
+                    column_unique_name = f"{table_unique_name}.{column['name']}"
+                    column_properties = {
+                        "name": column['name'],
+                        "unique_name": column_unique_name,
+                        "type": column['type'],
+                        "nullable": column['nullable'],
+                        "default": str(column.get('default')) # Ensure default is string
+                    }
+                    # Create Column node and HAS_COLUMN relationship
+                    session.run(
+                        """
+                        MATCH (t:Table {unique_name: $table_unique_name})
+                        MERGE (c:Column {unique_name: $column_unique_name})
+                        ON CREATE SET c += $props
+                        ON MATCH SET c += $props
+                        MERGE (t)-[:HAS_COLUMN]->(c)
+                        """,
+                        table_unique_name=table_unique_name,
+                        column_unique_name=column_unique_name,
+                        props=column_properties
+                    )
+            # After all tables and columns are created, create foreign key relationships
+            for table in schema_data['tables']:
+                table_unique_name = f"{db_name}.{table['name']}"
+                if table.get('foreign_keys'):
+                    for fk in table['foreign_keys']:
+                        constrained_columns = fk['constrained_columns']
+                        referred_table = fk['referred_table']
+                        referred_columns = fk['referred_columns']
+                        referred_table_unique_name = f"{db_name}.{referred_table}"
+                        for i, col_name in enumerate(constrained_columns):
+                            from_col_unique_name = f"{table_unique_name}.{col_name}"
+                            to_col_unique_name = f"{referred_table_unique_name}.{referred_columns[i]}"
+                            session.run(
+                                """
+                                MATCH (from_col:Column {unique_name: $from_col})
+                                MATCH (to_col:Column {unique_name: $to_col})
+                                MERGE (from_col)-[:REFERENCES]->(to_col)
+                                """,
+                                from_col=from_col_unique_name,
+                                to_col=to_col_unique_name
+                            )
+        logger.info(f"Successfully imported schema for database: {db_name}")
+    def find_shortest_path(self, start_node_name: str, end_node_name: str) -> List[Dict[str, Any]]:
+        """
+        Finds the shortest path between two nodes (Tables or Columns) in the graph.
+        This is a generic pathfinder.
+        """
+        query = """
+        MATCH (start {unique_name: $start_name}), (end {unique_name: $end_name})
+        CALL apoc.path.shortestPath(start, end, 'REFERENCES|HAS_COLUMN|HAS_TABLE', {maxLevel: 10}) YIELD path
+        RETURN path
+        """
+        with self._driver.session() as session:
+            result = session.run(query, start_name=start_node_name, end_name=end_node_name)
+            # The result is complex, we need to parse it into a user-friendly format.
+            # For now, returning the raw path objects.
+            return [record["path"] for record in result]
+    def keyword_search(self, keyword: str) -> List[Dict[str, Any]]:
+        """
+        Searches for tables and columns matching a keyword.
+        Returns a list of matching nodes with their database and table context.
+        """
+        query = """
+        MATCH (n)
+        WHERE (n:Table OR n:Column) AND n.name CONTAINS $keyword
+        OPTIONAL MATCH (d:Database)-[:HAS_TABLE]->(t:Table)-[:HAS_COLUMN]->(n) WHERE n:Column
+        OPTIONAL MATCH (d2:Database)-[:HAS_TABLE]->(n) WHERE n:Table
+        WITH COALESCE(d, d2) AS db, COALESCE(t, n) AS tbl, n AS item
+        RETURN db.name AS database, tbl.name AS table, item.name AS name, labels(item) AS type
+        LIMIT 25
+        """
+        with self._driver.session() as session:
+            result = session.run(query, keyword=keyword)
+            return [record.data() for record in result]
+    def get_table_row_count(self, table_unique_name: str) -> int:
+        """Retrieves the stored row count for a given table."""
+        query = """
+        MATCH (t:Table {unique_name: $unique_name})
+        RETURN t.row_count AS row_count
+        """
+        with self._driver.session() as session:
+            result = session.run(query, unique_name=table_unique_name)
+            record = result.single()
+            return record['row_count'] if record else -1

mcp/core/intelligence.py ADDED Viewed

	@@ -0,0 +1,161 @@

+import sqlparse
+import logging
+from typing import List, Dict, Any
+from .graph import GraphStore
+from .database import get_db_engine
+from . import config
+from sqlalchemy import text
+logger = logging.getLogger(__name__)
+# Constants for query cost estimation
+ROW_EXECUTION_THRESHOLD = 100  # Execute queries expected to return fewer rows
+JOIN_CARDINALITY_ESTIMATE = 1000 # A simplistic estimate for joins
+class QueryIntelligence:
+    """
+    Provides intelligence for handling SQL queries. It estimates query cost
+    and decides on an execution strategy.
+    """
+    def __init__(self, graph_store: GraphStore):
+        self.graph_store = graph_store
+        self.db_engines = {}
+    def _get_engine_for_db(self, db_name: str):
+        """Helper to get or create an engine for a specific database."""
+        if db_name not in self.db_engines:
+            # Assuming db_name includes the .db extension
+            connection_string = config.get_sqlite_connection_string(db_name)
+            self.db_engines[db_name] = get_db_engine(connection_string)
+        return self.db_engines.get(db_name)
+    async def get_relevant_schemas(self, query: str) -> List[Dict[str, Any]]:
+        """Finds schemas relevant to a natural language query."""
+        # This is a simplistic keyword search. A real implementation would use
+        # embedding-based search or an LLM to extract entities.
+        keywords = query.split()
+        all_results = []
+        for keyword in keywords:
+            if len(keyword) > 2: # Avoid very short keywords
+                results = self.graph_store.keyword_search(keyword)
+                all_results.extend(results)
+        # Deduplicate results
+        return [dict(t) for t in {tuple(d.items()) for d in all_results}]
+    async def find_join_path(self, table1_name: str, table2_name: str) -> str:
+        """Finds a join path between two tables using the graph."""
+        # This is a simplification. It requires table names to be unique or requires
+        # the user to provide fully qualified names (db.table).
+        t1_nodes = self.graph_store.keyword_search(table1_name)
+        t2_nodes = self.graph_store.keyword_search(table2_name)
+        if not t1_nodes or not t2_nodes:
+            return "Could not find one or both tables."
+        # Assume the first result is correct for simplicity
+        t1_unique_name = f"{t1_nodes[0]['database']}.{t1_nodes[0]['table']}"
+        t2_unique_name = f"{t2_nodes[0]['database']}.{t2_nodes[0]['table']}"
+        path_result = self.graph_store.find_shortest_path(t1_unique_name, t2_unique_name)
+        if not path_result:
+            return f"No path found between {table1_name} and {table2_name}."
+        # Format the path for display
+        # This is a complex task. The raw path from Neo4j needs careful parsing.
+        # This is a placeholder for that logic.
+        return f"Path found (details require parsing): {path_result}"
+    async def execute_query(self, sql: str, limit: int) -> List[Dict[str, Any]]:
+        """
+        Executes a SQL query against the appropriate database if the estimated
+        cost is below the threshold.
+        """
+        cost_estimate = self.estimate_query_cost(sql)
+        if cost_estimate['decision'] != 'execute':
+            raise PermissionError(f"Query execution denied. Estimated cost is too high ({cost_estimate['estimated_rows']} rows).")
+        # This is a major simplification. Determining which database to run the query
+        # against is a hard problem (especially for federated queries).
+        # We assume the first table found belongs to the correct database.
+        parsed_sql = self._parse_sql(sql)
+        if not parsed_sql['tables']:
+            raise ValueError("No tables found in SQL query.")
+        first_table = parsed_sql['tables'][0]
+        search_results = self.graph_store.keyword_search(first_table)
+        if not search_results:
+            raise ValueError(f"Table '{first_table}' not found in any known database.")
+        db_name = search_results[0]['database']
+        engine = self._get_engine_for_db(db_name)
+        if not engine:
+            raise ConnectionError(f"Could not connect to database: {db_name}")
+        with engine.connect() as connection:
+            # Append limit to the query
+            safe_sql = f"{sql.strip().rstrip(';')} LIMIT {int(limit)}"
+            result = connection.execute(text(safe_sql))
+            return [dict(row._mapping) for row in result.fetchall()]
+    def _parse_sql(self, sql: str) -> Dict[str, Any]:
+        """Parses the SQL to identify tables and columns."""
+        parsed = sqlparse.parse(sql)[0]
+        # This is a simplistic parser. A real implementation would need
+        # a much more robust SQL parsing library to handle complex queries, CTEs, etc.
+        tables = set()
+        for token in parsed.tokens:
+            if isinstance(token, sqlparse.sql.Identifier):
+                tables.add(token.get_real_name())
+            elif token.is_group:
+                # Look for identifiers within subgroups (e.g., in FROM or JOIN clauses)
+                for sub_token in token.tokens:
+                     if isinstance(sub_token, sqlparse.sql.Identifier):
+                         tables.add(sub_token.get_real_name())
+        return {"tables": list(tables)}
+    def estimate_query_cost(self, sql: str) -> Dict[str, Any]:
+        """
+        Estimates the cost of a query based on row counts from the graph.
+        """
+        try:
+            parsed_sql = self._parse_sql(sql)
+            tables_in_query = parsed_sql['tables']
+            if not tables_in_query:
+                return {"estimated_rows": 0, "decision": "execute", "message": "No tables found in query."}
+            # For simplicity, we'll take the max row count of any table in the query.
+            # A real system would analyze JOINs and WHERE clauses.
+            max_rows = 0
+            for table_name in tables_in_query:
+                # Need to find the unique name. This assumes table names are unique across DBs for now.
+                # A real implementation needs context of which DB is being queried.
+                search_result = self.graph_store.keyword_search(table_name)
+                if search_result:
+                    table_unique_name = f"{search_result[0]['database']}.{search_result[0]['table']}"
+                    row_count = self.graph_store.get_table_row_count(table_unique_name)
+                    if row_count > max_rows:
+                        max_rows = row_count
+            estimated_rows = max_rows
+            # Crude adjustment for joins
+            if len(tables_in_query) > 1:
+                # A better estimate would involve graph traversal and statistical models
+                estimated_rows *= JOIN_CARDINALITY_ESTIMATE * (len(tables_in_query) - 1)
+            decision = "execute" if estimated_rows < ROW_EXECUTION_THRESHOLD else "return_sql"
+            return {
+                "estimated_rows": estimated_rows,
+                "decision": decision,
+                "tables_found": tables_in_query
+            }
+        except Exception as e:
+            logger.error(f"Error estimating query cost: {e}")
+            return {"estimated_rows": -1, "decision": "error", "message": str(e)}

mcp/requirements.txt CHANGED Viewed

@@ -3,4 +3,5 @@ uvicorn==0.24.0
 neo4j==5.14.0
 pydantic==2.4.0
 requests==2.31.0
-psycopg2-binary==2.9.9

 neo4j==5.14.0
 pydantic==2.4.0
 requests==2.31.0
+SQLAlchemy==2.0.29
+sqlparse==0.5.0

ops/scripts/generate_sample_databases.py ADDED Viewed

	@@ -0,0 +1,160 @@

+import sqlite3
+import pandas as pd
+from datetime import datetime, timedelta
+import random
+import os
+# Ensure the data directory exists
+DATA_DIR = 'data'
+os.makedirs(DATA_DIR, exist_ok=True)
+def create_clinical_trials_db():
+    """Creates the clinical_trials.db database."""
+    conn = sqlite3.connect(os.path.join(DATA_DIR, 'clinical_trials.db'))
+    # Studies table
+    studies_data = {
+        'study_id': ['ONCO-2023-001', 'CARDIO-2023-047', 'NEURO-2024-012', 'DIAB-2023-089', 'RARE-2024-003'],
+        'study_name': ['Phase III Immunotherapy Trial', 'Beta Blocker Efficacy Study', 'Alzheimer Prevention Trial', 'Insulin Resistance Study', 'Rare Disease Natural History'],
+        'phase': ['Phase 3', 'Phase 2', 'Phase 2', 'Phase 3', 'Observational'],
+        'status': ['RECRUITING', 'ACTIVE', 'PLANNING', 'COMPLETED', 'RECRUITING'],
+        'sponsor': ['OncoPharm Inc', 'CardioHealth', 'NeuroGen', 'DiabetesCare', 'NIH'],
+        'target_enrollment': [500, 200, 150, 800, 50],
+        'current_enrollment': [237, 178, 0, 800, 12],
+        'start_date': ['2023-03-15', '2023-06-01', '2024-01-15', '2023-01-10', '2024-02-01']
+    }
+    pd.DataFrame(studies_data).to_sql('studies', conn, index=False, if_exists='replace')
+    # Patients table
+    patients_data = {
+        'patient_id': [f'PT{str(i).zfill(6)}' for i in range(1, 101)],
+        'study_id': [random.choice(studies_data['study_id']) for _ in range(100)],
+        'enrollment_date': [(datetime.now() - timedelta(days=random.randint(1, 365))).strftime('%Y-%m-%d') for _ in range(100)],
+        'age': [random.randint(18, 85) for _ in range(100)],
+        'gender': [random.choice(['M', 'F']) for _ in range(100)],
+        'status': [random.choice(['ENROLLED', 'COMPLETED', 'WITHDRAWN', 'SCREENING']) for _ in range(100)]
+    }
+    pd.DataFrame(patients_data).to_sql('patients', conn, index=False, if_exists='replace')
+    # Adverse Events table
+    adverse_events_data = {
+        'event_id': list(range(1, 51)),
+        'patient_id': [random.choice(patients_data['patient_id'][:50]) for _ in range(50)],
+        'event_date': [(datetime.now() - timedelta(days=random.randint(1, 180))).strftime('%Y-%m-%d') for _ in range(50)],
+        'event_type': [random.choice(['NAUSEA', 'HEADACHE', 'FATIGUE', 'RASH', 'FEVER']) for _ in range(50)],
+        'severity': [random.choice(['MILD', 'MODERATE', 'SEVERE']) for _ in range(50)],
+        'related_to_treatment': [random.choice(['YES', 'NO', 'UNKNOWN']) for _ in range(50)]
+    }
+    pd.DataFrame(adverse_events_data).to_sql('adverse_events', conn, index=False, if_exists='replace')
+    # Add foreign keys (SQLite doesn't enforce, but documents relationships)
+    conn.execute("CREATE INDEX idx_patients_study ON patients(study_id)")
+    conn.execute("CREATE INDEX idx_events_patient ON adverse_events(patient_id)")
+    conn.commit()
+    conn.close()
+    print("✅ Clinical Trials database created successfully!")
+def create_laboratory_db():
+    """Creates the laboratory.db database."""
+    conn = sqlite3.connect(os.path.join(DATA_DIR, 'laboratory.db'))
+    # Lab Tests table
+    lab_tests_data = {
+        'test_id': [f'LAB{str(i).zfill(8)}' for i in range(1, 201)],
+        'patient_id': [f'PT{str(random.randint(1, 100)).zfill(6)}' for _ in range(200)],
+        'test_date': [(datetime.now() - timedelta(days=random.randint(1, 365))).strftime('%Y-%m-%d') for _ in range(200)],
+        'test_type': [random.choice(['CBC', 'METABOLIC_PANEL', 'LIVER_FUNCTION', 'LIPID_PANEL', 'HBA1C']) for _ in range(200)],
+        'ordered_by': [f'DR{str(random.randint(1, 20)).zfill(4)}' for _ in range(200)],
+        'priority': [random.choice(['ROUTINE', 'URGENT', 'STAT']) for _ in range(200)]
+    }
+    pd.DataFrame(lab_tests_data).to_sql('lab_tests', conn, index=False, if_exists='replace')
+    # Test Results table
+    results_data = {
+        'result_id': list(range(1, 601)),
+        'test_id': [random.choice(lab_tests_data['test_id']) for _ in range(600)],
+        'analyte': [random.choice(['GLUCOSE', 'WBC', 'RBC', 'PLATELETS', 'CREATININE', 'ALT', 'AST', 'CHOLESTEROL']) for _ in range(600)],
+        'value': [round(random.uniform(1, 200), 2) for _ in range(600)],
+        'unit': [random.choice(['mg/dL', 'K/uL', 'M/uL', 'g/dL', 'mmol/L']) for _ in range(600)],
+        'reference_low': [round(random.uniform(1, 50), 2) for _ in range(600)],
+        'reference_high': [round(random.uniform(100, 200), 2) for _ in range(600)],
+        'flag': [random.choice(['NORMAL', 'HIGH', 'LOW', 'CRITICAL']) for _ in range(600)]
+    }
+    pd.DataFrame(results_data).to_sql('test_results', conn, index=False, if_exists='replace')
+    # Biomarkers table (for research)
+    biomarkers_data = {
+        'biomarker_id': list(range(1, 31)),
+        'patient_id': [f'PT{str(random.randint(1, 100)).zfill(6)}' for _ in range(30)],
+        'biomarker_name': [random.choice(['PD-L1', 'BRCA1', 'EGFR', 'KRAS', 'HER2']) for _ in range(30)],
+        'expression_level': [random.choice(['HIGH', 'MEDIUM', 'LOW', 'NEGATIVE']) for _ in range(30)],
+        'test_method': [random.choice(['IHC', 'PCR', 'NGS', 'FLOW_CYTOMETRY']) for _ in range(30)],
+        'collection_date': [(datetime.now() - timedelta(days=random.randint(1, 365))).strftime('%Y-%m-%d') for _ in range(30)]
+    }
+    pd.DataFrame(biomarkers_data).to_sql('biomarkers', conn, index=False, if_exists='replace')
+    conn.execute("CREATE INDEX idx_results_test ON test_results(test_id)")
+    conn.execute("CREATE INDEX idx_tests_patient ON lab_tests(patient_id)")
+    conn.commit()
+    conn.close()
+    print("✅ Laboratory database created successfully!")
+def create_drug_discovery_db():
+    """Creates the drug_discovery.db database."""
+    conn = sqlite3.connect(os.path.join(DATA_DIR, 'drug_discovery.db'))
+    # Compounds table
+    compounds_data = {
+        'compound_id': [f'CMP-{str(i).zfill(6)}' for i in range(1, 51)],
+        'compound_name': [f'Compound-{chr(65+i//10)}{i%10}' for i in range(50)],
+        'molecular_weight': [round(random.uniform(200, 800), 2) for _ in range(50)],
+        'formula': [f'C{random.randint(10,30)}H{random.randint(10,40)}N{random.randint(0,5)}O{random.randint(1,10)}' for _ in range(50)],
+        'development_stage': [random.choice(['DISCOVERY', 'LEAD_OPT', 'PRECLINICAL', 'CLINICAL', 'DISCONTINUED']) for _ in range(50)],
+        'target_class': [random.choice(['KINASE', 'GPCR', 'ION_CHANNEL', 'PROTEASE', 'ANTIBODY']) for _ in range(50)]
+    }
+    pd.DataFrame(compounds_data).to_sql('compounds', conn, index=False, if_exists='replace')
+    # Assay Results table
+    assays_data = {
+        'assay_id': list(range(1, 201)),
+        'compound_id': [random.choice(compounds_data['compound_id']) for _ in range(200)],
+        'assay_type': [random.choice(['BINDING', 'CELL_VIABILITY', 'ENZYME_INHIBITION', 'TOXICITY']) for _ in range(200)],
+        'ic50_nm': [round(random.uniform(0.1, 10000), 2) for _ in range(200)],
+        'efficacy_percent': [round(random.uniform(0, 100), 1) for _ in range(200)],
+        'assay_date': [(datetime.now() - timedelta(days=random.randint(1, 365))).strftime('%Y-%m-%d') for _ in range(200)],
+        'scientist': [f'SCI-{random.randint(1, 10)}' for _ in range(200)]
+    }
+    pd.DataFrame(assays_data).to_sql('assay_results', conn, index=False, if_exists='replace')
+    # Drug Targets table
+    targets_data = {
+        'target_id': [f'TGT-{str(i).zfill(4)}' for i in range(1, 21)],
+        'target_name': [f'Protein-{i}' for i in range(1, 21)],
+        'gene_symbol': [f'GENE{i}' for i in range(1, 21)],
+        'pathway': [random.choice(['MAPK', 'PI3K/AKT', 'WNT', 'NOTCH', 'HEDGEHOG']) for _ in range(20)],
+        'disease_area': [random.choice(['ONCOLOGY', 'CARDIOLOGY', 'NEUROLOGY', 'IMMUNOLOGY']) for _ in range(20)]
+    }
+    pd.DataFrame(targets_data).to_sql('drug_targets', conn, index=False, if_exists='replace')
+    # Compound-Target Associations
+    associations_data = {
+        'association_id': list(range(1, 76)),
+        'compound_id': [random.choice(compounds_data['compound_id']) for _ in range(75)],
+        'target_id': [random.choice(targets_data['target_id']) for _ in range(75)],
+        'affinity_nm': [round(random.uniform(0.1, 1000), 2) for _ in range(75)],
+        'selectivity_fold': [round(random.uniform(1, 100), 1) for _ in range(75)]
+    }
+    pd.DataFrame(associations_data).to_sql('compound_targets', conn, index=False, if_exists='replace')
+    conn.execute("CREATE INDEX idx_assays_compound ON assay_results(compound_id)")
+    conn.execute("CREATE INDEX idx_associations_compound ON compound_targets(compound_id)")
+    conn.execute("CREATE INDEX idx_associations_target ON compound_targets(target_id)")
+    conn.commit()
+    conn.close()
+    print("✅ Drug Discovery database created successfully!")
+if __name__ == "__main__":
+    create_clinical_trials_db()
+    create_laboratory_db()
+    create_drug_discovery_db()
+    print("\n🎉 All three Life Sciences databases created successfully in the 'data' directory!")

ops/scripts/ingest.py ADDED Viewed

	@@ -0,0 +1,71 @@

+import os
+import sys
+import logging
+from sqlalchemy import create_engine
+# Add project root to path to allow imports from mcp
+project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+sys.path.append(project_root)
+from core.discovery import discover_schema
+from core.graph import GraphStore
+from core.config import SQLITE_DATA_DIR
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+def ingest_sqlite_database(db_file: str, graph_store: GraphStore):
+    """Discovers schema from a SQLite DB and ingests it into Neo4j."""
+    db_path = os.path.join(SQLITE_DATA_DIR, db_file)
+    logger.info(f"Processing database: {db_path}")
+    if not os.path.exists(db_path):
+        logger.error(f"Database file not found: {db_path}")
+        return
+    try:
+        engine = create_engine(f"sqlite:///{db_path}")
+        schema_data = discover_schema(engine)
+        if schema_data:
+            logger.info(f"Discovered schema for {db_file}, ingesting into Neo4j...")
+            graph_store.import_schema(schema_data)
+            logger.info(f"Successfully ingested schema for {db_file}")
+        else:
+            logger.warning(f"Could not discover schema for {db_file}. Skipping.")
+    except Exception as e:
+        logger.error(f"An error occurred while processing {db_file}: {e}")
+def main():
+    """
+    Main function to run the ingestion process for all SQLite databases
+    found in the data directory.
+    """
+    logger.info("Starting schema ingestion process...")
+    if not os.path.exists(SQLITE_DATA_DIR) or not os.path.isdir(SQLITE_DATA_DIR):
+        logger.error(f"Data directory not found: {SQLITE_DATA_DIR}")
+        return
+    db_files = [f for f in os.listdir(SQLITE_DATA_DIR) if f.endswith(".db")]
+    if not db_files:
+        logger.warning(f"No SQLite database files (.db) found in {SQLITE_DATA_DIR}.")
+        return
+    try:
+        graph_store = GraphStore()
+        logger.info("Successfully connected to Neo4j.")
+    except Exception as e:
+        logger.error(f"Failed to connect to Neo4j. Aborting ingestion. Error: {e}")
+        return
+    for db_file in db_files:
+        ingest_sqlite_database(db_file, graph_store)
+    graph_store.close()
+    logger.info("Schema ingestion process completed.")
+if __name__ == "__main__":
+    main()

postgres/init.sql DELETED Viewed

@@ -1,28 +0,0 @@
--- Create sample tables for testing
-CREATE TABLE customers (
-    id SERIAL PRIMARY KEY,
-    email VARCHAR(255) UNIQUE NOT NULL,
-    first_name VARCHAR(100),
-    last_name VARCHAR(100),
-    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
-);
-CREATE TABLE orders (
-    id SERIAL PRIMARY KEY,
-    customer_id INTEGER REFERENCES customers(id),
-    order_date DATE NOT NULL,
-    total_amount DECIMAL(10,2),
-    status VARCHAR(50)
-);
--- Insert sample data
-INSERT INTO customers (email, first_name, last_name) VALUES
-    ('john.doe@email.com', 'John', 'Doe'),
-    ('jane.smith@email.com', 'Jane', 'Smith'),
-    ('bob.johnson@email.com', 'Bob', 'Johnson');
-INSERT INTO orders (customer_id, order_date, total_amount, status) VALUES
-    (1, '2024-01-15', 99.99, 'completed'),
-    (1, '2024-02-01', 149.99, 'completed'),
-    (2, '2024-01-20', 79.99, 'pending'),
-    (3, '2024-02-10', 199.99, 'completed');

streamlit/app.py CHANGED Viewed

@@ -7,478 +7,156 @@ All database access MUST go through MCP server - no direct connections allowed.
 import streamlit as st
 import requests
-import time
 import json
 import pandas as pd
-from datetime import datetime, timedelta
-import os
-from typing import Dict, Any, Optional, Tuple
-# Configuration
 MCP_URL = os.getenv("MCP_URL", "http://mcp:8000/mcp")
 MCP_API_KEY = os.getenv("MCP_API_KEY", "dev-key-123")
-# Page configuration
 st.set_page_config(
-    page_title="MCP Monitor & Query Tester",
-    page_icon="🤖",
-    layout="wide",
-    initial_sidebar_state="expanded"
 )
-# Initialize session state
-if 'workflow_id' not in st.session_state:
-    st.session_state.workflow_id = None
-if 'debug_log' not in st.session_state:
-    st.session_state.debug_log = []
-if 'last_refresh' not in st.session_state:
-    st.session_state.last_refresh = None
-def call_mcp(tool: str, params: Optional[Dict[str, Any]] = None) -> Tuple[Dict[str, Any], int]:
-    """
-    Call MCP server - the ONLY way to access databases.
-    Returns (response_data, response_time_ms)
-    """
-    start_time = time.time()
     try:
         response = requests.post(
-            MCP_URL,
-            headers={
-                "X-API-Key": MCP_API_KEY,
-                "Content-Type": "application/json"
-            },
-            json={"tool": tool, "params": params or {}},
-            timeout=10
         )
-        response_time = int((time.time() - start_time) * 1000)
-        # Log the request/response for debugging
-        debug_entry = {
-            "timestamp": datetime.now().isoformat(),
-            "tool": tool,
-            "params": params,
-            "status_code": response.status_code,
-            "response_time_ms": response_time,
-            "success": response.status_code == 200
-        }
-        st.session_state.debug_log.append(debug_entry)
-        # Keep only last 5 entries
-        if len(st.session_state.debug_log) > 5:
-            st.session_state.debug_log = st.session_state.debug_log[-5:]
-        if response.status_code == 200:
-            return response.json(), response_time
-        else:
-            return {"error": f"HTTP {response.status_code}: {response.text}"}, response_time
-    except requests.exceptions.RequestException as e:
-        response_time = int((time.time() - start_time) * 1000)
-        error_msg = f"MCP Server Error: {str(e)}"
-        # Log the error
-        debug_entry = {
-            "timestamp": datetime.now().isoformat(),
-            "tool": tool,
-            "params": params,
-            "status_code": 0,
-            "response_time_ms": response_time,
-            "success": False,
-            "error": error_msg
-        }
-        st.session_state.debug_log.append(debug_entry)
-        return {"error": error_msg}, response_time
-def test_neo4j_connection() -> Tuple[bool, int, str]:
-    """Test Neo4j connection through MCP server"""
-    result, response_time = call_mcp("get_schema")
-    if "error" in result:
-        return False, response_time, result["error"]
-    return True, response_time, "Connected"
-def test_postgres_connection() -> Tuple[bool, int, str]:
-    """Test PostgreSQL connection through MCP server"""
-    result, response_time = call_mcp("query_postgres", {"query": "SELECT 1 as test"})
-    if "error" in result:
-        return False, response_time, result["error"]
-    return True, response_time, "Connected"
-def test_mcp_server() -> Tuple[bool, int, str]:
-    """Test MCP server health"""
     try:
-        start_time = time.time()
-        response = requests.get(f"{MCP_URL.replace('/mcp', '/health')}", timeout=5)
-        response_time = int((time.time() - start_time) * 1000)
-        if response.status_code == 200:
-            return True, response_time, "Healthy"
-        else:
-            return False, response_time, f"HTTP {response.status_code}"
-    except Exception as e:
-        return False, 0, str(e)
-def get_performance_stats() -> Dict[str, Any]:
-    """Get performance statistics through MCP"""
-    result, _ = call_mcp("query_graph", {
-        "query": "MATCH (l:Log) WHERE l.timestamp > datetime() - duration('PT1H') RETURN count(l) as count"
-    })
-    if "error" in result:
-        return {"error": result["error"]}
-    return result.get("data", [{}])[0] if result.get("data") else {}
-def create_workflow(question: str) -> Optional[str]:
-    """Create a new workflow for the given question"""
-    workflow_id = f"streamlit-{int(time.time())}"
-    # Create workflow node
-    workflow_result, _ = call_mcp("write_graph", {
-        "action": "create_node",
-        "label": "Workflow",
-        "properties": {
-            "id": workflow_id,
-            "name": f"Streamlit Query: {question[:50]}...",
-            "description": f"Query from Streamlit: {question}",
-            "status": "active",
-            "created_at": datetime.now().isoformat(),
-            "source": "streamlit"
-        }
-    })
-    if "error" in workflow_result:
-        st.error(f"Failed to create workflow: {workflow_result['error']}")
-        return None
-    # Create instruction sequence
-    instructions = [
-        {
-            "id": f"{workflow_id}-inst-1",
-            "type": "discover_schema",
-            "sequence": 1,
-            "description": "Discover database schema",
-            "status": "pending",
-            "pause_duration": 5,  # Short pause for testing
-            "parameters": "{}"
-        },
-        {
-            "id": f"{workflow_id}-inst-2",
-            "type": "generate_sql",
-            "sequence": 2,
-            "description": f"Generate SQL for: {question}",
-            "status": "pending",
-            "pause_duration": 5,
-            "parameters": json.dumps({"question": question})
-        },
-        {
-            "id": f"{workflow_id}-inst-3",
-            "type": "review_results",
-            "sequence": 3,
-            "description": "Review and format results",
-            "status": "pending",
-            "pause_duration": 0,
-            "parameters": "{}"
-        }
-    ]
-    # Create instruction nodes
-    for inst in instructions:
-        inst_result, _ = call_mcp("write_graph", {
-            "action": "create_node",
-            "label": "Instruction",
-            "properties": inst
-        })
-        if "error" in inst_result:
-            st.error(f"Failed to create instruction: {inst_result['error']}")
-            return None
-        # Link instruction to workflow
-        link_result, _ = call_mcp("query_graph", {
-            "query": """
-                MATCH (w:Workflow {id: $workflow_id}), (i:Instruction {id: $inst_id})
-                CREATE (w)-[:HAS_INSTRUCTION]->(i)
-            """,
-            "parameters": {"workflow_id": workflow_id, "inst_id": inst["id"]}
-        })
-    # Create instruction chain
-    for i in range(len(instructions) - 1):
-        chain_result, _ = call_mcp("query_graph", {
-            "query": """
-                MATCH (i1:Instruction {id: $id1}), (i2:Instruction {id: $id2})
-                CREATE (i1)-[:NEXT_INSTRUCTION]->(i2)
-            """,
-            "parameters": {"id1": instructions[i]["id"], "id2": instructions[i + 1]["id"]}
-        })
-    return workflow_id
-def get_workflow_status(workflow_id: str) -> Dict[str, Any]:
-    """Get workflow execution status"""
-    result, _ = call_mcp("query_graph", {
-        "query": """
-            MATCH (w:Workflow {id: $id})-[:HAS_INSTRUCTION]->(i:Instruction)
-            RETURN w.status as workflow_status,
-                   collect(i.status) as instruction_statuses,
-                   collect(i.type) as instruction_types,
-                   collect(i.sequence) as sequences
-        """,
-        "parameters": {"id": workflow_id}
-    })
-    if "error" in result or not result.get("data"):
-        return {"error": "Workflow not found"}
-    return result["data"][0]
-def get_workflow_results(workflow_id: str) -> Dict[str, Any]:
-    """Get workflow execution results"""
-    result, _ = call_mcp("query_graph", {
-        "query": """
-            MATCH (w:Workflow {id: $id})-[:HAS_INSTRUCTION]->(i:Instruction)-[:EXECUTED_AS]->(e:Execution)
-            RETURN i.sequence as sequence,
-                   i.type as type,
-                   i.description as description,
-                   e.result as result,
-                   e.started_at as started_at,
-                   e.completed_at as completed_at
-            ORDER BY i.sequence
-        """,
-        "parameters": {"id": workflow_id}
-    })
-    if "error" in result:
-        return {"error": result["error"]}
-    return {"executions": result.get("data", [])}
-def get_schema_context() -> str:
-    """Get database schema context for display"""
-    result, _ = call_mcp("query_graph", {
-        "query": """
-            MATCH (t:Table)-[:HAS_COLUMN]->(c:Column)
-            RETURN t.name as table_name,
-                   collect({name: c.name, type: c.data_type, nullable: c.nullable}) as columns
-            ORDER BY t.name
-        """
-    })
-    if "error" in result:
-        return f"Error fetching schema: {result['error']}"
-    schema_text = "Database Schema:\n"
-    for record in result.get("data", []):
-        table_name = record["table_name"]
-        columns = record["columns"]
-        schema_text += f"\nTable: {table_name}\n"
-        for col in columns:
-            nullable = "NULL" if col["nullable"] else "NOT NULL"
-            schema_text += f"  - {col['name']}: {col['type']} {nullable}\n"
-    return schema_text
-def main():
-    st.title("🤖 MCP Monitor & Query Tester")
-    st.caption("Monitor agentic system health and test queries through MCP server")
-    # Sidebar
-    with st.sidebar:
-        st.header("🔧 Configuration")
-        st.code(f"MCP URL: {MCP_URL}")
-        st.code(f"API Key: {MCP_API_KEY[:10]}...")
-        if st.button("🔄 Refresh All", type="primary"):
-            st.rerun()
-        st.header("📊 Quick Stats")
-        stats = get_performance_stats()
-        if "error" not in stats:
-            st.metric("Logs (1h)", stats.get("count", 0))
-        else:
-            st.error(f"Stats error: {stats['error']}")
-    # Main tabs
-    tab1, tab2 = st.tabs(["🔌 Connection Status", "🤖 Query Tester"])
-    with tab1:
-        st.header("Connection Status Monitor")
-        st.caption("All database access goes through MCP server - no direct connections allowed")
-        # Connection status in columns
-        col1, col2, col3 = st.columns(3)
-        with col1:
-            st.subheader("Neo4j (via MCP)")
-            neo4j_ok, neo4j_time, neo4j_msg = test_neo4j_connection()
-            st.metric(
-                label="Status",
-                value="Online" if neo4j_ok else "Offline",
-                delta=f"{neo4j_time}ms"
-            )
-            if neo4j_ok:
-                st.success(neo4j_msg)
-            else:
-                st.error(neo4j_msg)
-        with col2:
-            st.subheader("PostgreSQL (via MCP)")
-            postgres_ok, postgres_time, postgres_msg = test_postgres_connection()
-            st.metric(
-                label="Status",
-                value="Online" if postgres_ok else "Offline",
-                delta=f"{postgres_time}ms"
-            )
-            if postgres_ok:
-                st.success(postgres_msg)
-            else:
-                st.error(postgres_msg)
-        with col3:
-            st.subheader("MCP Server")
-            mcp_ok, mcp_time, mcp_msg = test_mcp_server()
-            st.metric(
-                label="Status",
-                value="Online" if mcp_ok else "Offline",
-                delta=f"{mcp_time}ms"
-            )
-            if mcp_ok:
-                st.success(mcp_msg)
-            else:
-                st.error(mcp_msg)
-        # Performance stats
-        st.subheader("Performance Statistics")
-        stats = get_performance_stats()
-        if "error" not in stats:
-            st.info(f"Operations in last hour: {stats.get('count', 0)}")
-        else:
-            st.error(f"Cannot fetch stats: {stats['error']}")
-        # Auto-refresh info
-        st.session_state.last_refresh = datetime.now()
-        st.caption(f"Last checked: {st.session_state.last_refresh.strftime('%H:%M:%S')}")
-        # Auto-refresh every 5 seconds
-        time.sleep(5)
-        st.rerun()
-    with tab2:
-        st.header("Query Tester")
-        st.caption("Test natural language queries through the agentic engine")
-        # Query input
-        question = st.text_area(
-            "Enter your question:",
-            height=100,
-            placeholder="e.g., 'How many customers do we have?' or 'Show me all orders from last month'"
-        )
-        col1, col2 = st.columns([1, 1])
-        with col1:
-            if st.button("🚀 Execute Query", type="primary", disabled=not question.strip()):
-                if question.strip():
-                    with st.spinner("Creating workflow..."):
-                        workflow_id = create_workflow(question.strip())
-                        if workflow_id:
-                            st.session_state.workflow_id = workflow_id
-                            st.success(f"Workflow created: {workflow_id}")
-                        else:
-                            st.error("Failed to create workflow")
-        with col2:
-            if st.button("🗑️ Clear Results"):
-                st.session_state.workflow_id = None
-                st.rerun()
-        # Workflow execution monitoring
-        if st.session_state.workflow_id:
-            st.subheader("Execution Progress")
-            # Get workflow status
-            status = get_workflow_status(st.session_state.workflow_id)
-            if "error" in status:
-                st.error(f"Status error: {status['error']}")
-            else:
-                workflow_status = status.get("workflow_status", "unknown")
-                instruction_statuses = status.get("instruction_statuses", [])
-                instruction_types = status.get("instruction_types", [])
-                # Progress bar
-                completed = sum(1 for s in instruction_statuses if s == "complete")
-                total = len(instruction_statuses)
-                progress = completed / total if total > 0 else 0
-                st.progress(progress)
-                st.caption(f"Progress: {completed}/{total} instructions completed")
-                # Status display
-                status_cols = st.columns(len(instruction_types))
-                for i, (inst_type, inst_status) in enumerate(zip(instruction_types, instruction_statuses)):
-                    with status_cols[i]:
-                        if inst_status == "complete":
-                            st.success(f"✅ {inst_type}")
-                        elif inst_status == "executing":
-                            st.warning(f"🔄 {inst_type}")
-                        elif inst_status == "failed":
-                            st.error(f"❌ {inst_type}")
-                        else:
-                            st.info(f"⏳ {inst_type}")
-                # Get and display results
-                if completed > 0:
-                    results = get_workflow_results(st.session_state.workflow_id)
-                    if "error" not in results:
-                        st.subheader("Execution Results")
-                        for execution in results.get("executions", []):
-                            with st.expander(f"Step {execution['sequence']}: {execution['type']}"):
-                                st.write(f"**Description:** {execution['description']}")
-                                if execution['started_at'] and execution['completed_at']:
-                                    start = datetime.fromisoformat(execution['started_at'].replace('Z', '+00:00'))
-                                    end = datetime.fromisoformat(execution['completed_at'].replace('Z', '+00:00'))
-                                    duration = (end - start).total_seconds()
-                                    st.write(f"**Duration:** {duration:.2f} seconds")
-                                if execution['result']:
-                                    try:
-                                        result_data = json.loads(execution['result']) if isinstance(execution['result'], str) else execution['result']
-                                        if execution['type'] == 'generate_sql' and 'generated_sql' in result_data:
-                                            st.write("**Generated SQL:**")
-                                            st.code(result_data['generated_sql'], language='sql')
-                                        if 'data' in result_data and result_data['data']:
-                                            st.write("**Query Results:**")
-                                            df = pd.DataFrame(result_data['data'])
-                                            st.dataframe(df)
-                                        if 'error' in result_data:
-                                            st.error(f"Error: {result_data['error']}")
-                                    except Exception as e:
-                                        st.write("**Raw Result:**")
-                                        st.code(str(execution['result']))
-                    else:
-                        st.error(f"Results error: {results['error']}")
-        # Debug information
-        with st.expander("🔧 Debug Information"):
-            st.write("**Last 5 MCP Requests:**")
-            for entry in st.session_state.debug_log:
-                status_icon = "✅" if entry["success"] else "❌"
-                st.write(f"{status_icon} {entry['timestamp']} - {entry['tool']} ({entry['response_time_ms']}ms)")
-                if not entry["success"] and "error" in entry:
-                    st.error(f"Error: {entry['error']}")
-            st.write("**Important:** All database operations go through MCP server. Direct database access is not permitted.")
 if __name__ == "__main__":
     main()

 import streamlit as st
 import requests
+import os
 import json
 import pandas as pd
+from typing import Dict, Any
+# --- Configuration ---
+AGENT_URL = os.getenv("AGENT_URL", "http://agent:8001/query")
+NEO4J_URL = os.getenv("NEO4J_URL", "http://neo4j:7474")
 MCP_URL = os.getenv("MCP_URL", "http://mcp:8000/mcp")
 MCP_API_KEY = os.getenv("MCP_API_KEY", "dev-key-123")
 st.set_page_config(
+    page_title="GraphRAG Chat",
+    page_icon="💬",
+    layout="wide"
 )
+# --- Session State ---
+if 'messages' not in st.session_state:
+    st.session_state.messages = []
+if 'schema_info' not in st.session_state:
+    st.session_state.schema_info = ""
+# --- Helper Functions ---
+def stream_agent_response(question: str):
+    """Streams the agent's response, yielding JSON objects."""
+    try:
+        with requests.post(AGENT_URL, json={"question": question}, stream=True, timeout=300) as r:
+            r.raise_for_status()
+            for chunk in r.iter_content(chunk_size=None):
+                if chunk:
+                    try:
+                        yield json.loads(chunk)
+                    except json.JSONDecodeError:
+                        # Handle potential parsing errors if chunks are not perfect JSON
+                        logger.warning(f"Could not decode JSON chunk: {chunk}")
+                        continue
+    except requests.exceptions.RequestException as e:
+        yield {"error": f"Failed to connect to agent: {e}"}
+def fetch_schema_info() -> str:
+    """Fetches the database schema from the MCP server for display."""
     try:
         response = requests.post(
+            f"{MCP_URL}/discovery/get_relevant_schemas",
+            headers={"Authorization": f"Bearer {MCP_API_KEY}", "Content-Type": "application/json"},
+            json={"query": ""}
         )
+        response.raise_for_status()
+        data = response.json()
+        if data.get("status") == "success":
+            schemas = data.get("schemas", [])
+            if not schemas: return "No schema information found."
+            # Group columns by table
+            tables = {}
+            for s in schemas:
+                table_key = f"{s.get('database', '')}.{s.get('table', '')}"
+                if table_key not in tables:
+                    tables[table_key] = []
+                tables[table_key].append(f"{s.get('name', '')} ({s.get('type', [''])[0]})")
+            schema_text = ""
+            for table, columns in tables.items():
+                schema_text += f"**{table}**:\n"
+                for col in columns:
+                    schema_text += f"- {col}\n"
+            return schema_text
+        else:
+            return f"Error from MCP: {data.get('message', 'Unknown error')}"
+    except requests.exceptions.RequestException as e:
+        return f"Could not fetch schema: {e}"
+@st.cache_data(ttl=600)
+def get_cached_schema():
+    """Cache the schema info to avoid repeated calls."""
+    return fetch_schema_info()
+def check_service_health(service_name: str, url: str) -> bool:
+    """Checks if a service is reachable."""
     try:
+        response = requests.get(url, timeout=5)
+        return response.status_code in [200, 401]
+    except requests.exceptions.RequestException:
+        return False
+# --- UI Components ---
+def display_sidebar():
+    with st.sidebar:
+        st.title("🗄️ Database Schema")
+        if st.button("🔄 Refresh Schema"):
+            st.cache_data.clear()
+        st.session_state.schema_info = get_cached_schema()
+        st.markdown(st.session_state.schema_info)
+        st.markdown("---")
+        st.title("🔌 Service Status")
+        neo4j_status = "✅ Online" if check_service_health("Neo4j", NEO4J_URL) else "❌ Offline"
+        mcp_status = "✅ Online" if check_service_health("MCP", MCP_URL.replace("/mcp", "/health")) else "❌ Offline"
+        st.markdown(f"**Neo4j:** {neo4j_status}")
+        st.markdown(f"**MCP Server:** {mcp_status}")
+        st.markdown("---")
+        if st.button("🗑️ Clear Chat History"):
+            st.session_state.messages = []
+            st.rerun()
+def main():
+    display_sidebar()
+    st.title("💬 GraphRAG Conversational Agent")
+    st.markdown("Ask questions about the life sciences dataset. The agent's thought process will be shown below.")
+    # Display chat history
+    for message in st.session_state.messages:
+        with st.chat_message(message["role"]):
+            st.markdown(message["content"])
+    if prompt := st.chat_input("Ask your question here..."):
+        st.session_state.messages.append({"role": "user", "content": prompt})
+        with st.chat_message("user"):
+            st.markdown(prompt)
+        with st.chat_message("assistant"):
+            full_response = ""
+            response_box = st.empty()
+            for chunk in stream_agent_response(prompt):
+                if "error" in chunk:
+                    full_response = chunk["error"]
+                    response_box.error(full_response)
+                    break
+                content = chunk.get("content", "")
+                if chunk.get("type") == "thought":
+                    full_response += f"🤔 *{content}*\n\n"
+                elif chunk.get("type") == "observation":
+                    full_response += f"{content}\n\n"
+                elif chunk.get("type") == "final_answer":
+                    full_response += f"**Final Answer:**\n{content}"
+                response_box.markdown(full_response)
+        st.session_state.messages.append({"role": "assistant", "content": full_response})
 if __name__ == "__main__":
     main()