YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GraphRAG Agentic System

Overview

This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.

Key Features

πŸ€– LangChain Agent: Orchestrates tools for schema discovery, pathfinding, and query execution.
πŸ•ΈοΈ GraphRAG Enabled: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.
πŸ”¬ Life Sciences Dataset: Comes with a rich dataset across clinical trials, drug discovery, and lab results.
conversational Conversational UI: A Streamlit-based chat interface for interacting with the agent.
πŸ”Œ RESTful MCP Server: All core logic is exposed via a secure and scalable FastAPI server.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Streamlit Chat  │──────│  Agent        β”‚      β”‚   MCP Server    β”‚
β”‚      (UI)       β”‚      β”‚ (LangChain)   β”‚      β”‚    (FastAPI)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                               β”‚                       β”‚                       β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚   Neo4j     β”‚         β”‚ clinical_   β”‚         β”‚ laboratory  β”‚
                         β”‚ (Schema KG) β”‚         β”‚ trials.db   β”‚         β”‚ .db         β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                                                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                 β”‚ drug_       β”‚
                                                 β”‚ discovery.dbβ”‚
                                                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

  • Streamlit: Provides a conversational chat interface for users to ask questions.
  • Agent: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
  • MCP Server: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
  • Neo4j: Stores a knowledge graph of the schemas of all connected SQLite databases.
  • SQLite Databases: A set of life sciences databases (clinical_trials.db, drug_discovery.db, laboratory.db) that serve as the federated data sources.

Quick Start

Prerequisites

  • Docker & Docker Compose
  • LLM API key (e.g., for OpenAI)

Setup

  1. Clone and configure:

    git clone <repository-url>
    cd <repository-name>
    touch .env
    
  2. Add your LLM API key to the .env file.

    LLM_API_KEY="sk-your-llm-api-key-here"
    
  3. Start the system:

    make up
    
  4. Seed the databases and ingest schema:

    make seed-db
    make ingest
    
  5. Open the interface:

Usage

Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:

  • "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
  • "Find all drugs with 'Aspirin' in their name."
  • "Show me lab results for patient '123'."

The agent will then:

  1. Use the SchemaSearchTool to find relevant tables.
  2. Use the JoinPathFinderTool to determine how to join them.
  3. Construct a SQL query.
  4. Execute the query using the QueryExecutorTool.
  5. Return the final answer to the UI.

Development

Running the Agent Manually

To test the agent's logic directly without the full Docker stack, you can run it from your terminal.

  1. Set up the environment: Make sure the MCP and Neo4j services are running (make up). Create a Python virtual environment and install dependencies:

    python -m venv venv
    source venv/bin/activate
    pip install -r agent/requirements.txt
    
  2. Set your API key:

    export LLM_API_KEY="sk-your-llm-api-key-here"
    
  3. Run the agent:

    python agent/main.py
    

    The agent will run with the hardcoded example question and print the execution trace and final answer to your console.

File Structure

β”œβ”€β”€ agent/          # The LangChain agent and its tools
β”œβ”€β”€ streamlit/      # The Streamlit conversational UI
β”œβ”€β”€ mcp/            # FastAPI server with core logic
β”œβ”€β”€ neo4j/          # Neo4j configuration and data
β”œβ”€β”€ data/           # SQLite databases
β”œβ”€β”€ ops/            # Operational scripts (seeding, ingestion, etc.)
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Makefile
└── README.md
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support