File size: 6,852 Bytes
5b72c99
 
 
 
 
 
 
 
 
86cbe3c
9930ba9
28e46ad
86cbe3c
28e46ad
 
 
86cbe3c
 
 
 
 
9930ba9
 
 
28e46ad
86cbe3c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
398a370
86cbe3c
398a370
28e46ad
 
86cbe3c
 
 
 
 
9930ba9
 
 
28e46ad
 
9d411a7
28e46ad
 
 
9930ba9
 
28e46ad
86cbe3c
9930ba9
 
9d411a7
86cbe3c
9d411a7
28e46ad
 
 
9930ba9
 
 
 
86cbe3c
9930ba9
86cbe3c
 
9930ba9
 
28e46ad
86cbe3c
28e46ad
 
 
86cbe3c
 
 
 
 
 
 
 
 
 
 
9930ba9
5b72c99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9930ba9
 
86cbe3c
 
 
 
 
 
 
 
 
 
 
 
 
 
9d411a7
86cbe3c
 
 
 
 
 
 
 
28e46ad
 
86cbe3c
 
 
 
 
 
28e46ad
 
 
5b72c99
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: Agent MCP SQL
emoji: 🧠
sdk: streamlit
app_file: space_app.py
python_version: 3.11
pinned: false
---

# GraphRAG Agentic System

## Overview
This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.

## Key Features

πŸ€– **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution.  
πŸ•ΈοΈ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.  
πŸ”¬ **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results.  
 conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent.  
πŸ”Œ **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server.

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Streamlit Chat  │──────│  Agent        β”‚      β”‚   MCP Server    β”‚
β”‚      (UI)       β”‚      β”‚ (LangChain)   β”‚      β”‚    (FastAPI)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                               β”‚                       β”‚                       β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚   Neo4j     β”‚         β”‚ clinical_   β”‚         β”‚ laboratory  β”‚
                         β”‚ (Schema KG) β”‚         β”‚ trials.db   β”‚         β”‚ .db         β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                                                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                 β”‚ drug_       β”‚
                                                 β”‚ discovery.dbβ”‚
                                                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

### Components

- **Streamlit**: Provides a conversational chat interface for users to ask questions.
- **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
- **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
- **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases.
- **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.

## Quick Start

### Prerequisites
- Docker & Docker Compose
- LLM API key (e.g., for OpenAI)

### Setup
1. **Clone and configure**:
   ```bash
   git clone <repository-url>
   cd <repository-name>
   touch .env
   ```

2. **Add your LLM API key** to the `.env` file.
   ```
   LLM_API_KEY="sk-your-llm-api-key-here"
   ```

3. **Start the system**:
   ```bash
   make up
   ```

4. **Seed the databases and ingest schema**:
   ```bash
   make seed-db
   make ingest
   ```

5. **Open the interface**:
   - Streamlit UI: http://localhost:8501
   - Neo4j Browser: http://localhost:7474 (neo4j/password)

## Usage
Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
- "Find all drugs with 'Aspirin' in their name."
- "Show me lab results for patient '123'."

The agent will then:
1. Use the `SchemaSearchTool` to find relevant tables.
2. Use the `JoinPathFinderTool` to determine how to join them.
3. Construct a SQL query.
4. Execute the query using the `QueryExecutorTool`.
5. Return the final answer to the UI.

### Deploying a Hugging Face Space (Streamlit front-end only)

This repo includes a self-contained Streamlit app for Hugging Face Spaces: `space_app.py`.
It connects to your externally reachable Agent and MCP services.

1) Expose your services (public host or tunnel)
   - Agent FastAPI endpoint: `https://<your-host>/query`
   - MCP FastAPI base: `https://<your-host>/mcp`

2) In a new HF Space (Streamlit), add these files:
   - `space_app.py` (entrypoint)
   - `requirements.txt` with:
     ```
     streamlit==1.28.0
     requests==2.31.0
     pandas==2.1.0
     ```

3) In Space Settings β†’ Variables and secrets, set:
   - `AGENT_URL` (e.g., `https://your-agent-host/query`)
   - `MCP_URL` (e.g., `https://your-mcp-host/mcp`)
   - `MCP_API_KEY` (the MCP auth key)
   - (Optional) `AGENT_HEALTH_URL`, `NEO4J_URL`

4) Configure the Space to run `space_app.py` as the Streamlit app file.

Once the Space starts, it will display the same chat UI and stream responses from your hosted Agent.

## Development

### Running the Agent Manually
To test the agent's logic directly without the full Docker stack, you can run it from your terminal.

1.  **Set up the environment**:
    Make sure the MCP and Neo4j services are running (`make up`).
    Create a Python virtual environment and install dependencies:
    ```bash
    python -m venv venv
    source venv/bin/activate
    pip install -r agent/requirements.txt
    ```

2.  **Set your API key**:
    ```bash
    export LLM_API_KEY="sk-your-llm-api-key-here"
    ```

3.  **Run the agent**:
    ```bash
    python agent/main.py
    ```
    The agent will run with the hardcoded example question and print the execution trace and final answer to your console.

### File Structure
```
β”œβ”€β”€ agent/          # The LangChain agent and its tools
β”œβ”€β”€ streamlit/      # The Streamlit conversational UI
β”œβ”€β”€ mcp/            # FastAPI server with core logic
β”œβ”€β”€ neo4j/          # Neo4j configuration and data
β”œβ”€β”€ data/           # SQLite databases
β”œβ”€β”€ ops/            # Operational scripts (seeding, ingestion, etc.)
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Makefile
└── README.md
```


testing, what do you see from the mcp and db?