File size: 5,692 Bytes
86cbe3c 9930ba9 28e46ad 86cbe3c 28e46ad 86cbe3c 9930ba9 28e46ad 86cbe3c 398a370 86cbe3c 398a370 28e46ad 86cbe3c 9930ba9 28e46ad 9d411a7 28e46ad 9930ba9 28e46ad 86cbe3c 9930ba9 9d411a7 86cbe3c 9d411a7 28e46ad 9930ba9 86cbe3c 9930ba9 86cbe3c 9930ba9 28e46ad 86cbe3c 28e46ad 86cbe3c 9930ba9 86cbe3c 9d411a7 86cbe3c 28e46ad 86cbe3c 28e46ad 86cbe3c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | # GraphRAG Agentic System
## Overview
This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.
## Key Features
π€ **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution.
πΈοΈ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.
π¬ **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results.
conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent.
π **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server.
## Architecture
```
βββββββββββββββββββ βββββββββββββββββ βββββββββββββββββββ
β Streamlit Chat ββββββββ Agent β β MCP Server β
β (UI) β β (LangChain) β β (FastAPI) β
βββββββββββββββββββ βββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β β β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Neo4j β β clinical_ β β laboratory β
β (Schema KG) β β trials.db β β .db β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β
βββββββββββββββ
β drug_ β
β discovery.dbβ
βββββββββββββββ
```
### Components
- **Streamlit**: Provides a conversational chat interface for users to ask questions.
- **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
- **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
- **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases.
- **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.
## Quick Start
### Prerequisites
- Docker & Docker Compose
- LLM API key (e.g., for OpenAI)
### Setup
1. **Clone and configure**:
```bash
git clone <repository-url>
cd <repository-name>
touch .env
```
2. **Add your LLM API key** to the `.env` file.
```
LLM_API_KEY="sk-your-llm-api-key-here"
```
3. **Start the system**:
```bash
make up
```
4. **Seed the databases and ingest schema**:
```bash
make seed-db
make ingest
```
5. **Open the interface**:
- Streamlit UI: http://localhost:8501
- Neo4j Browser: http://localhost:7474 (neo4j/password)
## Usage
Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
- "Find all drugs with 'Aspirin' in their name."
- "Show me lab results for patient '123'."
The agent will then:
1. Use the `SchemaSearchTool` to find relevant tables.
2. Use the `JoinPathFinderTool` to determine how to join them.
3. Construct a SQL query.
4. Execute the query using the `QueryExecutorTool`.
5. Return the final answer to the UI.
## Development
### Running the Agent Manually
To test the agent's logic directly without the full Docker stack, you can run it from your terminal.
1. **Set up the environment**:
Make sure the MCP and Neo4j services are running (`make up`).
Create a Python virtual environment and install dependencies:
```bash
python -m venv venv
source venv/bin/activate
pip install -r agent/requirements.txt
```
2. **Set your API key**:
```bash
export LLM_API_KEY="sk-your-llm-api-key-here"
```
3. **Run the agent**:
```bash
python agent/main.py
```
The agent will run with the hardcoded example question and print the execution trace and final answer to your console.
### File Structure
```
βββ agent/ # The LangChain agent and its tools
βββ streamlit/ # The Streamlit conversational UI
βββ mcp/ # FastAPI server with core logic
βββ neo4j/ # Neo4j configuration and data
βββ data/ # SQLite databases
βββ ops/ # Operational scripts (seeding, ingestion, etc.)
βββ docker-compose.yml
βββ Makefile
βββ README.md
``` |