agent-mcp-sql / README.md
ohmygaugh's picture
This fixed the docker container health errors. just there is no mcp connection still.
9d411a7
# GraphRAG Agentic System
## Overview
This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.
## Key Features
πŸ€– **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution.
πŸ•ΈοΈ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.
πŸ”¬ **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results.
conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent.
πŸ”Œ **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server.
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Streamlit Chat │──────│ Agent β”‚ β”‚ MCP Server β”‚
β”‚ (UI) β”‚ β”‚ (LangChain) β”‚ β”‚ (FastAPI) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Neo4j β”‚ β”‚ clinical_ β”‚ β”‚ laboratory β”‚
β”‚ (Schema KG) β”‚ β”‚ trials.db β”‚ β”‚ .db β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ drug_ β”‚
β”‚ discovery.dbβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Components
- **Streamlit**: Provides a conversational chat interface for users to ask questions.
- **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
- **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
- **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases.
- **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.
## Quick Start
### Prerequisites
- Docker & Docker Compose
- LLM API key (e.g., for OpenAI)
### Setup
1. **Clone and configure**:
```bash
git clone <repository-url>
cd <repository-name>
touch .env
```
2. **Add your LLM API key** to the `.env` file.
```
LLM_API_KEY="sk-your-llm-api-key-here"
```
3. **Start the system**:
```bash
make up
```
4. **Seed the databases and ingest schema**:
```bash
make seed-db
make ingest
```
5. **Open the interface**:
- Streamlit UI: http://localhost:8501
- Neo4j Browser: http://localhost:7474 (neo4j/password)
## Usage
Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
- "Find all drugs with 'Aspirin' in their name."
- "Show me lab results for patient '123'."
The agent will then:
1. Use the `SchemaSearchTool` to find relevant tables.
2. Use the `JoinPathFinderTool` to determine how to join them.
3. Construct a SQL query.
4. Execute the query using the `QueryExecutorTool`.
5. Return the final answer to the UI.
## Development
### Running the Agent Manually
To test the agent's logic directly without the full Docker stack, you can run it from your terminal.
1. **Set up the environment**:
Make sure the MCP and Neo4j services are running (`make up`).
Create a Python virtual environment and install dependencies:
```bash
python -m venv venv
source venv/bin/activate
pip install -r agent/requirements.txt
```
2. **Set your API key**:
```bash
export LLM_API_KEY="sk-your-llm-api-key-here"
```
3. **Run the agent**:
```bash
python agent/main.py
```
The agent will run with the hardcoded example question and print the execution trace and final answer to your console.
### File Structure
```
β”œβ”€β”€ agent/ # The LangChain agent and its tools
β”œβ”€β”€ streamlit/ # The Streamlit conversational UI
β”œβ”€β”€ mcp/ # FastAPI server with core logic
β”œβ”€β”€ neo4j/ # Neo4j configuration and data
β”œβ”€β”€ data/ # SQLite databases
β”œβ”€β”€ ops/ # Operational scripts (seeding, ingestion, etc.)
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Makefile
└── README.md
```