agent-mcp-sql / plan.md
ohmygaugh's picture
planning phase
398a370
# Implementation Plan: Intelligent Graph-Based SQL Federation Middleware (Revised)
This document outlines the revised strategy to implement the target features by integrating valuable assets from the `semantic-query-router` codebase into our existing architecture.
### Overall Strategy
The core task is to evolve the current single-step agent into a multi-step, GraphRAG-powered orchestrator using LangChain. We will enhance the MCP server with advanced core logic, replace PostgreSQL with a rich life sciences SQLite dataset, and transform the Streamlit monitor into a fully conversational chat UI. The `frontend/` Next.js application will be deprecated.
---
### Phase 1: Integrate New Dataset & Core Logic (Due by Friday, Oct 3rd)
**Goal**: Replace the existing data foundation with the life sciences dataset and upgrade the MCP server with advanced, reusable logic from the `semantic-query-router` project.
- **Task 1.1: Adopt Life Sciences Dataset**
- Integrate the `generate_sample_databases.py` script into our `ops/scripts/` directory.
- Create a new `make seed-db` command in the `Makefile` to generate the `clinical_trials.db`, `laboratory.db`, and `drug_discovery.db` SQLite files.
- Update `docker-compose.yml` to remove the PostgreSQL service and mount the new `data/` directory for the SQLite databases.
- **Task 1.2: Enhance MCP Server with Core Logic**
- Create a new `mcp/core/` directory.
- Migrate the advanced logic from `semantic-query-router/src/core/` (`discovery.py`, `graph.py`, `intelligence.py`) into our `mcp/core/` directory.
- Refactor these modules to fit our project structure and standards.
- **Task 1.3: Create a Dedicated Ingestion Process**
- Create a new script, `ops/scripts/ingest.py`, that uses the new core logic to perform a one-time ingestion of the SQLite database schemas into Neo4j.
- Create a `make ingest` command in the `Makefile` to run this script. This separates the schema ingestion process from the agent's runtime duties, making the system more modular.
- Remove the schema discovery logic from `agent/main.py`.
---
### Phase 2: Rebuild Agent with LangChain (Due by Tuesday, Oct 7th)
**Goal**: Re-architect the agent from a simple script into a robust LangChain-powered orchestrator that leverages the enhanced MCP server.
- **Task 2.1: Refactor Agent to use LangChain**
- Overhaul `agent/main.py` to implement the `AgentExecutor` pattern from `langchain_integration.py`.
- Define a formal agent prompt that instructs the LLM on how to use the available tools to answer questions.
- **Task 2.2: Implement Custom LangChain Tools**
- Create a new `agent/tools.py` file.
- Implement custom LangChain tools that make authenticated REST API calls to our enhanced MCP server.
- The tools will include: `SchemaSearchTool`, `JoinPathFinderTool`, and `QueryExecutorTool`. These tools will act as clients to the powerful logic we integrated into the MCP in Phase 1.
- **Task 2.3: Update Agent's Main Loop**
- Modify the agent's main loop to delegate tasks to the LangChain `AgentExecutor` instead of handling instructions directly. The agent's primary role will now be to orchestrate the LangChain agent and log the results.
---
### Phase 3: Build the Chat UI & Finalize (Due by Thursday, Oct 9th)
**Goal**: Replace the basic Streamlit monitor with a full-featured conversational chat interface and complete the final integration for the demo.
- **Task 3.1: Implement Conversational Chat UI**
- Replace the entire contents of `streamlit/app.py` with the conversational UI logic from `semantic-query-router/src/chat_app.py`.
- Adapt the UI to work with our project's MCP REST API (instead of WebSocket) for submitting questions and fetching results.
- **Task 3.2: Integrate Demo-Specific Features**
- Ensure the new Streamlit UI includes the required demo features:
- Display of execution phases (e.g., "Searching Schema," "Finding Join Path," "Executing Query").
- A final results view that shows both the natural language summary from the agent and a clean data table (Pandas DataFrame) of the raw results.
- A "Download CSV" button for the results table.
- A sidebar that displays the connection status of the Neo4j and SQLite databases.
- **Task 3.3: Final Integration and Testing**
- Perform end-to-end testing of the full workflow: from asking a question in the Streamlit app to the agent's orchestration and the final result display.
- Clean up any unused files and finalize the `README.md` with updated instructions.