title: Code Knowledge Graph Explorer β π€ Transformers Library
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
tags:
- building-mcp-track-enterprise
short_description: MCP server for big code β explore Transformers
π Code Knowledge Graph MCP Server
Helping LLM-based agents navigate and understand large codebases
π What is this project?
This project provides a Model Context Protocol (MCP) server that transforms code repositories into navigable knowledge graphs. It enables Large Language Model (LLM) based agents to efficiently explore, understand, and reason about complex codebases β a critical capability for modern software engineering education and practice.
π¬ Use Case: EPITA Coding Courses
This project was developed with educational applications in mind, specifically to support EPITA coding courses:
π Enhanced Code Discovery for Agents
LLM-based coding agents can use this tool to better discover and navigate large repositories. Instead of blindly searching through files, agents can:
- Query the knowledge graph to understand the overall architecture
- Follow relationships between modules, classes, and functions
- Identify entry points and critical code paths
- Understand how different parts of the codebase interact
π Detecting Areas for Code Improvement
For EPITA courses, this tool helps agents identify areas where student code can be improved:
- Dead Code Detection: Find unused functions, classes, or variables
- Circular Dependencies: Detect problematic import cycles between modules
- Code Coupling Analysis: Identify tightly coupled components that should be refactored
- Missing Documentation: Find undocumented public APIs and complex functions
- Complexity Hotspots: Locate chunks with many outgoing calls (high coupling)
- Orphan Code: Detect code that is declared but never called
π EPITA Course Integration
- Project Reviews: Quickly understand student project architectures before grading
- Automated Feedback: Integrate with LLM tutors to provide targeted improvement suggestions
- Code Quality Assessment: Consistent evaluation criteria across student submissions
- Learning Tool: Help students navigate and understand unfamiliar codebases (e.g., open-source projects)
- Research: Study code organization patterns across student projects
The MCP interface makes it easy to integrate with any LLM-based tutoring or code review system used in EPITA courses.
π― The Problem We Solve
At EPITA (Γcole pour l'informatique et les techniques avancΓ©es), students work on increasingly complex software projects throughout their curriculum. Understanding large codebases β whether their own, their teammates', or open-source libraries β is a fundamental skill for any computer science engineer.
However, LLM-based coding assistants face significant challenges when working with large repositories:
- Context window limitations: LLMs cannot process entire codebases at once
- Lack of structural awareness: Without understanding how code is organized, LLMs struggle to locate relevant files
- Missing relationships: Function calls, class inheritance, and module dependencies are not immediately visible
- Inefficient search: Simple keyword search fails to capture semantic meaning
π‘ Our Solution: Knowledge Graphs + MCP
This project addresses these challenges by:
- Parsing repositories into a structured knowledge graph (files β chunks β entities)
- Extracting relationships between code elements (calls, contains, declares, imports)
- Indexing content with hybrid search (semantic embeddings + keyword matching)
- Exposing tools via MCP that allow LLM agents to navigate the codebase intelligently
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CODE REPOSITORY β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β File A β β File B β β File C β β File D β ... β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
βββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KNOWLEDGE GRAPH CONSTRUCTION β
β β’ AST Parsing (Python, C/C++, Java, JavaScript, Rust, HTML) β
β β’ Entity Extraction (classes, functions, variables, methods) β
β β’ Relationship Detection (calls, inheritance, imports) β
β β’ Code Chunking & Embedding (semantic vectors) β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP SERVER (Gradio) β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
β βsearch_nodes β βgo_to_def β βfind_usages β βget_neighborsβ β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
β βget_file_ β βget_related β βfind_path β βprint_tree β β
β βstructure β β_chunks β β β β β β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM-BASED AGENT β
β β’ Can search for relevant code using natural language β
β β’ Navigate from function calls to their definitions β
β β’ Understand the structure of files and directories β
β β’ Trace dependencies and relationships across the codebase β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π οΈ MCP Tools Available
The MCP server exposes the following tools for LLM agents:
| Tool | Description |
|---|---|
search_nodes |
Semantic + keyword search for code chunks |
get_node_info |
Detailed information about any node (file, chunk, entity) |
get_node_edges |
Incoming and outgoing relationships of a node |
go_to_definition |
Find where a function/class/variable is declared |
find_usages |
Find all places where an entity is called/used |
get_neighbors |
Get all directly connected nodes |
get_file_structure |
Overview of a file's chunks and entities |
get_related_chunks |
Find chunks related by a specific relationship type |
list_all_entities |
List all tracked entities in the codebase |
get_graph_stats |
Statistics about the knowledge graph |
find_path |
Find shortest path between two nodes |
get_subgraph |
Extract a subgraph around a node |
print_tree |
Display repository structure as a tree |
diff_chunks |
Compare content between two code chunks |
search_by_type_and_name |
Search entities by type (class, function, etc.) and name |
get_chunk_context |
Get a chunk with its surrounding context |
π Supported Languages
The knowledge graph builder uses AST-based entity extraction for accurate parsing:
| Language | Parser | Entity Types |
|---|---|---|
| Python | ast module |
classes, functions, methods, variables, imports |
| C | libclang |
functions, structs, typedefs, variables |
| C++ | libclang |
classes, namespaces, methods, templates |
| Java | javalang |
classes, interfaces, methods, fields |
| JavaScript/TypeScript | esprima |
classes, functions, variables, imports |
| Rust | tree-sitter |
structs, enums, traits, functions, modules |
| HTML | BeautifulSoup |
DOM elements, inline JS extraction |
The system also detects API endpoints for web frameworks (FastAPI, Flask, Spring Boot, Actix-web, etc.).
π Getting Started
Prerequisites
- Docker & Docker Compose
- Python 3.10+ (for local development)
- CUDA-capable GPU (optional, for faster embeddings)
Quick Start with Docker
# Start the MCP server with a sample knowledge graph
docker-compose up
Building a Knowledge Graph from Your Repository
from RepoKnowledgeGraphLib.RepoKnowledgeGraph import RepoKnowledgeGraph
# From a local path
kg = RepoKnowledgeGraph.from_path(
"/path/to/your/repo",
skip_dirs=["node_modules", ".git", "__pycache__"],
extract_entities=True,
index_nodes=True
)
# Save for later use
kg.save_graph_to_file("my_knowledge_graph.json")
Running the MCP using Gradio
python gradio_mcp.py --graph-file my_knowledge_graph.json --host 0.0.0.0 --port 7860
π Interactive Explorer (Gradio UI)
The project includes a Gradio-based web interface for exploring knowledge graphs interactively:
- Search: Use natural language or keywords to find relevant code
- Navigate: Click through nodes to explore relationships
- Analyze: Get statistics about code structure and dependencies
- Visualize: View the repository tree and entity relationships
π Data Sources
The application supports loading knowledge graphs from multiple sources:
1. HuggingFace Hub Dataset (Recommended for Sharing)
Load directly from a HuggingFace dataset created by the library (cf. Publishing to Huggingface Hub):
python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
2. Local JSON File
Use a local JSON file (e.g., multihop_knowledge_graph_with_embeddings.json):
python gradio_mcp.py --host 0.0.0.0 --port 7860 --graph-file data/multihop_knowledge_graph_with_embeddings.json
3. Direct from Git Repository
Clone and analyze a repository on-the-fly:
python gradio_mcp.py --host 0.0.0.0 --port 7860 --repo-url "https://github.com/user/repo.git"
Publishing to HuggingFace Hub
You can save an existing knowledge graph to HuggingFace Hub for sharing:
from RepoKnowledgeGraphLib import RepoKnowledgeGraph
# Load from local file
kg = RepoKnowledgeGraph.load("path/to/graph.json")
# Push to HuggingFace Hub (without embeddings to reduce size)
kg.to_hf_dataset("username/my-knowledge-graph", save_embeddings=False, private=False)
# Or with embeddings (larger dataset)
kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
ποΈ Architecture Overview
root/
βββ Dockerfile # Docker configuration
βββ requirements.txt # Python dependencies
βββ RepoKnowledgeGraphLib/ # Knowledge graph implementation
β βββ RepoKnowledgeGraph.py # Main graph class
β βββ KnowledgeGraphMCPServer.py # MCP server implementation
β βββ EntityExtractor.py # AST-based entity extraction
β βββ CodeParser.py # Code chunking
β βββ CodeIndex.py # Hybrid search (LanceDB/Weaviate)
β βββ ModelService.py # Embedding generation
β βββ Node.py # Graph node types
βββ gradio_mcp_space.py # Main Gradio web interface
π₯ Team
Team Name: CEPIA Ionis Team
Team Members:
- Laila ELKOUSSY - @lailaelkoussy - Research Engineer, Data Scientist
- Julien PEREZ - @jnm38 - Research Director
π License
This project is developed as part of research at EPITA / Ionis Group.
π Related Resources
- Model Context Protocol (MCP) - The protocol standard
- Gradio - Python web interface framework with MCP support
- LanceDB - Vector database for code indexing
- Salesforce SFR-Embedding-Code - Code embedding model
π VS Code Integration
To use this MCP server with GitHub Copilot in VS Code, you need to configure an mcp.json file.
Configuration File Location
Create or edit the file at .vscode/mcp.json in your workspace root:
your-workspace/
βββ .vscode/
β βββ mcp.json β Place the configuration here
βββ src/
βββ ...
Configuration Content
Add the following content to .vscode/mcp.json:
{
"servers": {
"transformers-code-graph": {
"url": "https://lailaelkoussy-transformers-library-knowledge-graph.hf.space/gradio_api/mcp/",
"type": "http"
}
},
"inputs": []
}
What This Does
servers: Defines the MCP servers available to VS Codetransformers-code-graph: A custom name for this server connectionurl: The endpoint of the hosted MCP server (here pointing to the HuggingFace Space)type: Set to"http"for remote HTTP-based MCP servers
Using with Your Own Server
If you're running your own MCP server locally, update the URL accordingly:
{
"servers": {
"my-code-graph": {
"url": "http://localhost:7860/gradio_api/mcp/",
"type": "http"
}
},
"inputs": []
}
Once configured, GitHub Copilot in VS Code will have access to all the knowledge graph tools (search_nodes, go_to_definition, find_usages, etc.) to help navigate and understand your codebase.