agent-mcp-sql / plan.md
ohmygaugh's picture
planning phase
398a370

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

Implementation Plan: Intelligent Graph-Based SQL Federation Middleware (Revised)

This document outlines the revised strategy to implement the target features by integrating valuable assets from the semantic-query-router codebase into our existing architecture.

Overall Strategy

The core task is to evolve the current single-step agent into a multi-step, GraphRAG-powered orchestrator using LangChain. We will enhance the MCP server with advanced core logic, replace PostgreSQL with a rich life sciences SQLite dataset, and transform the Streamlit monitor into a fully conversational chat UI. The frontend/ Next.js application will be deprecated.


Phase 1: Integrate New Dataset & Core Logic (Due by Friday, Oct 3rd)

Goal: Replace the existing data foundation with the life sciences dataset and upgrade the MCP server with advanced, reusable logic from the semantic-query-router project.

  • Task 1.1: Adopt Life Sciences Dataset

    • Integrate the generate_sample_databases.py script into our ops/scripts/ directory.
    • Create a new make seed-db command in the Makefile to generate the clinical_trials.db, laboratory.db, and drug_discovery.db SQLite files.
    • Update docker-compose.yml to remove the PostgreSQL service and mount the new data/ directory for the SQLite databases.
  • Task 1.2: Enhance MCP Server with Core Logic

    • Create a new mcp/core/ directory.
    • Migrate the advanced logic from semantic-query-router/src/core/ (discovery.py, graph.py, intelligence.py) into our mcp/core/ directory.
    • Refactor these modules to fit our project structure and standards.
  • Task 1.3: Create a Dedicated Ingestion Process

    • Create a new script, ops/scripts/ingest.py, that uses the new core logic to perform a one-time ingestion of the SQLite database schemas into Neo4j.
    • Create a make ingest command in the Makefile to run this script. This separates the schema ingestion process from the agent's runtime duties, making the system more modular.
    • Remove the schema discovery logic from agent/main.py.

Phase 2: Rebuild Agent with LangChain (Due by Tuesday, Oct 7th)

Goal: Re-architect the agent from a simple script into a robust LangChain-powered orchestrator that leverages the enhanced MCP server.

  • Task 2.1: Refactor Agent to use LangChain

    • Overhaul agent/main.py to implement the AgentExecutor pattern from langchain_integration.py.
    • Define a formal agent prompt that instructs the LLM on how to use the available tools to answer questions.
  • Task 2.2: Implement Custom LangChain Tools

    • Create a new agent/tools.py file.
    • Implement custom LangChain tools that make authenticated REST API calls to our enhanced MCP server.
    • The tools will include: SchemaSearchTool, JoinPathFinderTool, and QueryExecutorTool. These tools will act as clients to the powerful logic we integrated into the MCP in Phase 1.
  • Task 2.3: Update Agent's Main Loop

    • Modify the agent's main loop to delegate tasks to the LangChain AgentExecutor instead of handling instructions directly. The agent's primary role will now be to orchestrate the LangChain agent and log the results.

Phase 3: Build the Chat UI & Finalize (Due by Thursday, Oct 9th)

Goal: Replace the basic Streamlit monitor with a full-featured conversational chat interface and complete the final integration for the demo.

  • Task 3.1: Implement Conversational Chat UI

    • Replace the entire contents of streamlit/app.py with the conversational UI logic from semantic-query-router/src/chat_app.py.
    • Adapt the UI to work with our project's MCP REST API (instead of WebSocket) for submitting questions and fetching results.
  • Task 3.2: Integrate Demo-Specific Features

    • Ensure the new Streamlit UI includes the required demo features:
      • Display of execution phases (e.g., "Searching Schema," "Finding Join Path," "Executing Query").
      • A final results view that shows both the natural language summary from the agent and a clean data table (Pandas DataFrame) of the raw results.
      • A "Download CSV" button for the results table.
      • A sidebar that displays the connection status of the Neo4j and SQLite databases.
  • Task 3.3: Final Integration and Testing

    • Perform end-to-end testing of the full workflow: from asking a question in the Streamlit app to the agent's orchestration and the final result display.
    • Clean up any unused files and finalize the README.md with updated instructions.