lailaelkoussy commited on
Commit
42ad0cb
Β·
verified Β·
1 Parent(s): 61e29b9

update readme.md

Browse files
Files changed (1) hide show
  1. README.md +296 -30
README.md CHANGED
@@ -6,43 +6,210 @@ colorTo: purple
6
  sdk: docker
7
  app_port: 7860
8
  pinned: false
9
- tags:
10
- - building-mcp-track-enterprise
 
11
  ---
12
 
13
- # Knowledge Graph MCP Explorer
14
 
15
- This is a Gradio-based interactive tool for exploring code repository knowledge graphs. It provides a web interface to search, navigate, and analyze code relationships using the Model Context Protocol (MCP).
16
 
17
- ## Features
18
 
19
- - **Search Nodes**: Search for code entities, functions, classes, and more using semantic search
20
- - **Graph Navigation**: Explore relationships between code elements
21
- - **Entity Tracking**: View declared and called entities within code chunks
22
- - **Path Finding**: Find paths between different nodes in the knowledge graph
23
- - **Subgraph Extraction**: Extract and visualize subgraphs around specific nodes
24
- - **File Structure**: View the hierarchical structure of the repository
25
 
26
- ## Usage
27
 
28
- The application loads a pre-built knowledge graph from the HuggingFace Transformers repository. You can:
29
 
30
- 1. **Search**: Use the search tab to find relevant code snippets and entities
31
- 2. **Explore**: Navigate through the graph using node IDs
32
- 3. **Analyze**: Get statistics about the code structure and relationships
33
 
34
- ## Technical Details
35
 
36
- - Built with Gradio for the web interface
37
- - Uses LanceDB for efficient code indexing and search
38
- - Supports hybrid search (keyword + semantic embeddings)
39
- - Pre-computed embeddings using Salesforce/SFR-Embedding-Code-400M_R model
40
 
41
- ## Data Sources
42
 
43
- The application supports loading knowledge graphs from:
44
 
45
- ### 1. HuggingFace Hub Dataset (Recommended)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  Load directly from a HuggingFace dataset:
48
 
@@ -58,9 +225,17 @@ Use a local JSON file (e.g., `multihop_knowledge_graph_with_embeddings.json`):
58
  python gradio_mcp.py --host 0.0.0.0 --port 7860 --graph-file data/multihop_knowledge_graph_with_embeddings.json
59
  ```
60
 
61
- ### Creating and Publishing a Dataset
 
 
 
 
 
 
 
 
62
 
63
- You can save an existing knowledge graph to HuggingFace Hub:
64
 
65
  ```python
66
  from RepoKnowledgeGraphLib import RepoKnowledgeGraph
@@ -75,7 +250,7 @@ kg.to_hf_dataset("username/my-knowledge-graph", save_embeddings=False, private=F
75
  kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
76
  ```
77
 
78
- ## Docker Configuration
79
 
80
  The default Dockerfile uses a local JSON file. To use HuggingFace datasets instead, modify the CMD line in `Dockerfile`:
81
 
@@ -87,7 +262,7 @@ CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--
87
  CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--graph-file", "/app/data/multihop_knowledge_graph_with_embeddings.json"]
88
  ```
89
 
90
- ## Local Development
91
 
92
  To run locally:
93
 
@@ -103,7 +278,7 @@ pip install -r requirements.txt
103
  python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
104
  ```
105
 
106
- ## Deployment to HuggingFace Spaces
107
 
108
  ### Option 1: Using HuggingFace Dataset (Recommended)
109
 
@@ -124,6 +299,27 @@ python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-n
124
  git push
125
  ```
126
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
 
128
 
129
 
@@ -134,4 +330,74 @@ python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-n
134
 
135
  **Team Members:**
136
  - **Laila ELKOUSSY** - [@lailaelkoussy](https://huggingface.co/lailaelkoussy) - Research Engineer, Data Scientist
137
- - **Julien PEREZ** - [@jnm38](https://huggingface.co/jnm38) -
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  sdk: docker
7
  app_port: 7860
8
  pinned: false
9
+ tags:
10
+ - building-mcp-track-enterprise
11
+ short_description: MCP server for big code β€” explore Transformers
12
  ---
13
 
14
+ # πŸŽ“ Code Knowledge Graph MCP Server
15
 
16
+ > **Helping LLM-based agents navigate and understand large codebases**
17
 
18
+ ## πŸ“š What is this project?
19
 
20
+ This project provides a [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server that transforms code repositories into navigable **knowledge graphs**. It enables Large Language Model (LLM) based agents to efficiently explore, understand, and reason about complex codebases β€” a critical capability for modern software engineering education and practice.
 
 
 
 
 
21
 
22
+ ## πŸ”¬ Use Case: EPITA Coding Courses
23
 
24
+ This project was developed with **educational applications** in mind, specifically to support **EPITA coding courses**:
25
 
26
+ ### πŸ” Enhanced Code Discovery for Agents
 
 
27
 
28
+ LLM-based coding agents can use this tool to **better discover and navigate large repositories**. Instead of blindly searching through files, agents can:
29
 
30
+ - Query the knowledge graph to understand the overall architecture
31
+ - Follow relationships between modules, classes, and functions
32
+ - Identify entry points and critical code paths
33
+ - Understand how different parts of the codebase interact
34
 
35
+ ### πŸ“ˆ Detecting Areas for Code Improvement
36
 
37
+ For EPITA courses, this tool helps agents **identify areas where student code can be improved**:
38
 
39
+ - **Dead Code Detection**: Find unused functions, classes, or variables
40
+ - **Circular Dependencies**: Detect problematic import cycles between modules
41
+ - **Code Coupling Analysis**: Identify tightly coupled components that should be refactored
42
+ - **Missing Documentation**: Find undocumented public APIs and complex functions
43
+ - **Complexity Hotspots**: Locate chunks with many outgoing calls (high coupling)
44
+ - **Orphan Code**: Detect code that is declared but never called
45
+
46
+ ### πŸŽ“ EPITA Course Integration
47
+
48
+ - **Project Reviews**: Quickly understand student project architectures before grading
49
+ - **Automated Feedback**: Integrate with LLM tutors to provide targeted improvement suggestions
50
+ - **Code Quality Assessment**: Consistent evaluation criteria across student submissions
51
+ - **Learning Tool**: Help students navigate and understand unfamiliar codebases (e.g., open-source projects)
52
+ - **Research**: Study code organization patterns across student projects
53
+
54
+ The MCP interface makes it easy to integrate with any LLM-based tutoring or code review system used in EPITA courses.
55
+
56
+ ---
57
+
58
+ ### 🎯 The Problem We Solve
59
+
60
+ At **EPITA** (Γ‰cole pour l'informatique et les techniques avancΓ©es), students work on increasingly complex software projects throughout their curriculum. Understanding large codebases β€” whether their own, their teammates', or open-source libraries β€” is a fundamental skill for any computer science engineer.
61
+
62
+ However, LLM-based coding assistants face significant challenges when working with large repositories:
63
+
64
+ - **Context window limitations**: LLMs cannot process entire codebases at once
65
+ - **Lack of structural awareness**: Without understanding how code is organized, LLMs struggle to locate relevant files
66
+ - **Missing relationships**: Function calls, class inheritance, and module dependencies are not immediately visible
67
+ - **Inefficient search**: Simple keyword search fails to capture semantic meaning
68
+
69
+ ### πŸ’‘ Our Solution: Knowledge Graphs + MCP
70
+
71
+ This project addresses these challenges by:
72
+
73
+ 1. **Parsing repositories** into a structured knowledge graph (files β†’ chunks β†’ entities)
74
+ 2. **Extracting relationships** between code elements (calls, contains, declares, imports)
75
+ 3. **Indexing content** with hybrid search (semantic embeddings + keyword matching)
76
+ 4. **Exposing tools via MCP** that allow LLM agents to navigate the codebase intelligently
77
+
78
+ ```
79
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
80
+ β”‚ CODE REPOSITORY β”‚
81
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
82
+ β”‚ β”‚ File A β”‚ β”‚ File B β”‚ β”‚ File C β”‚ β”‚ File D β”‚ ... β”‚
83
+ β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚
84
+ β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
85
+ β–Ό β–Ό β–Ό β–Ό
86
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
87
+ β”‚ KNOWLEDGE GRAPH CONSTRUCTION β”‚
88
+ β”‚ β€’ AST Parsing (Python, C/C++, Java, JavaScript, Rust, HTML) β”‚
89
+ β”‚ β€’ Entity Extraction (classes, functions, variables, methods) β”‚
90
+ β”‚ β€’ Relationship Detection (calls, inheritance, imports) β”‚
91
+ β”‚ β€’ Code Chunking & Embedding (semantic vectors) β”‚
92
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
93
+ β–Ό
94
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
95
+ β”‚ MCP SERVER (FastMCP) β”‚
96
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
97
+ β”‚ β”‚search_nodes β”‚ β”‚go_to_def β”‚ β”‚find_usages β”‚ β”‚get_neighborsβ”‚ β”‚
98
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
99
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
100
+ β”‚ β”‚get_file_ β”‚ β”‚get_related β”‚ β”‚find_path β”‚ β”‚print_tree β”‚ β”‚
101
+ β”‚ β”‚structure β”‚ β”‚_chunks β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
102
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
103
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
104
+ β–Ό
105
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
106
+ β”‚ LLM-BASED AGENT β”‚
107
+ β”‚ β€’ Can search for relevant code using natural language β”‚
108
+ β”‚ β€’ Navigate from function calls to their definitions β”‚
109
+ β”‚ β€’ Understand the structure of files and directories β”‚
110
+ β”‚ β€’ Trace dependencies and relationships across the codebase β”‚
111
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
112
+ ```
113
+
114
+ ## πŸ› οΈ MCP Tools Available
115
+
116
+ The MCP server exposes the following tools for LLM agents:
117
+
118
+ | Tool | Description |
119
+ | ------------------------- | --------------------------------------------------------- |
120
+ | `search_nodes` | Semantic + keyword search for code chunks |
121
+ | `get_node_info` | Detailed information about any node (file, chunk, entity) |
122
+ | `get_node_edges` | Incoming and outgoing relationships of a node |
123
+ | `go_to_definition` | Find where a function/class/variable is declared |
124
+ | `find_usages` | Find all places where an entity is called/used |
125
+ | `get_neighbors` | Get all directly connected nodes |
126
+ | `get_file_structure` | Overview of a file's chunks and entities |
127
+ | `get_related_chunks` | Find chunks related by a specific relationship type |
128
+ | `list_all_entities` | List all tracked entities in the codebase |
129
+ | `get_graph_stats` | Statistics about the knowledge graph |
130
+ | `find_path` | Find shortest path between two nodes |
131
+ | `get_subgraph` | Extract a subgraph around a node |
132
+ | `print_tree` | Display repository structure as a tree |
133
+ | `diff_chunks` | Compare content between two code chunks |
134
+ | `search_by_type_and_name` | Search entities by type (class, function, etc.) and name |
135
+ | `get_chunk_context` | Get a chunk with its surrounding context |
136
+
137
+ ## 🌐 Supported Languages
138
+
139
+ The knowledge graph builder uses **AST-based entity extraction** for accurate parsing:
140
+
141
+ | Language | Parser | Entity Types |
142
+ | --------------------- | --------------- | ----------------------------------------------- |
143
+ | Python | `ast` module | classes, functions, methods, variables, imports |
144
+ | C | `libclang` | functions, structs, typedefs, variables |
145
+ | C++ | `libclang` | classes, namespaces, methods, templates |
146
+ | Java | `javalang` | classes, interfaces, methods, fields |
147
+ | JavaScript/TypeScript | `esprima` | classes, functions, variables, imports |
148
+ | Rust | `tree-sitter` | structs, enums, traits, functions, modules |
149
+ | HTML | `BeautifulSoup` | DOM elements, inline JS extraction |
150
+
151
+ The system also detects **API endpoints** for web frameworks (FastAPI, Flask, Spring Boot, Actix-web, etc.).
152
+
153
+ ## πŸš€ Getting Started
154
+
155
+ ### Prerequisites
156
+
157
+ - Docker & Docker Compose
158
+ - Python 3.10+ (for local development)
159
+ - CUDA-capable GPU (optional, for faster embeddings)
160
+
161
+ ### Quick Start with Docker
162
+
163
+ ```bash
164
+ # Clone the repository
165
+ git clone https://github.com/lailanelkoussy/mcp-first-birthday.git
166
+ cd mcp-first-birthday
167
+
168
+ # Start the MCP server with a sample knowledge graph
169
+ docker-compose up
170
+ ```
171
+
172
+ ### Building a Knowledge Graph from Your Repository
173
+
174
+ ```python
175
+ from pedagogia_graph_code_repo.RepoKnowledgeGraphLib import RepoKnowledgeGraph
176
+
177
+ # From a local path
178
+ kg = RepoKnowledgeGraph.from_path(
179
+ "/path/to/your/repo",
180
+ skip_dirs=["node_modules", ".git", "__pycache__"],
181
+ extract_entities=True,
182
+ index_nodes=True
183
+ )
184
+
185
+ # Save for later use
186
+ kg.save_graph_to_file("my_knowledge_graph.json")
187
+ ```
188
+
189
+ ### Running the MCP Server
190
+
191
+ ```bash
192
+ # Using the Gradio interface (recommended for exploration)
193
+ python gradio_mcp.py --graph-file my_knowledge_graph.json --host 0.0.0.0 --port 7860
194
+
195
+ # Or directly as an MCP server
196
+ python pedagogia_graph_code_repo/run_mcp_server.py --graph-file my_knowledge_graph.json
197
+ ```
198
+
199
+ ## πŸ“Š Interactive Explorer (Gradio UI)
200
+
201
+ The project includes a Gradio-based web interface for exploring knowledge graphs interactively:
202
+
203
+ - **Search**: Use natural language or keywords to find relevant code
204
+ - **Navigate**: Click through nodes to explore relationships
205
+ - **Analyze**: Get statistics about code structure and dependencies
206
+ - **Visualize**: View the repository tree and entity relationships
207
+
208
+ ## πŸ“ Data Sources
209
+
210
+ The application supports loading knowledge graphs from multiple sources:
211
+
212
+ ### 1. HuggingFace Hub Dataset (Recommended for Sharing)
213
 
214
  Load directly from a HuggingFace dataset:
215
 
 
225
  python gradio_mcp.py --host 0.0.0.0 --port 7860 --graph-file data/multihop_knowledge_graph_with_embeddings.json
226
  ```
227
 
228
+ ### 3. Direct from Git Repository
229
+
230
+ Clone and analyze a repository on-the-fly:
231
+
232
+ ```bash
233
+ python gradio_mcp.py --host 0.0.0.0 --port 7860 --repo-url "https://github.com/user/repo.git"
234
+ ```
235
+
236
+ ### Publishing to HuggingFace Hub
237
 
238
+ You can save an existing knowledge graph to HuggingFace Hub for sharing:
239
 
240
  ```python
241
  from RepoKnowledgeGraphLib import RepoKnowledgeGraph
 
250
  kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
251
  ```
252
 
253
+ ## 🐳 Docker Configuration
254
 
255
  The default Dockerfile uses a local JSON file. To use HuggingFace datasets instead, modify the CMD line in `Dockerfile`:
256
 
 
262
  CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--graph-file", "/app/data/multihop_knowledge_graph_with_embeddings.json"]
263
  ```
264
 
265
+ ## πŸ’» Local Development
266
 
267
  To run locally:
268
 
 
278
  python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
279
  ```
280
 
281
+ ## ☁️ Deployment to HuggingFace Spaces
282
 
283
  ### Option 1: Using HuggingFace Dataset (Recommended)
284
 
 
299
  git push
300
  ```
301
 
302
+ ## πŸ—οΈ Architecture Overview
303
+
304
+ ```
305
+ mcp-first-birthday/
306
+ β”œβ”€β”€ gradio_mcp.py # Main Gradio web interface
307
+ β”œβ”€β”€ Dockerfile # Docker configuration
308
+ β”œβ”€β”€ requirements.txt # Python dependencies
309
+ β”œβ”€β”€ pedagogia_graph_code_repo/ # Core library
310
+ β”‚ β”œβ”€β”€ RepoKnowledgeGraphLib/ # Knowledge graph implementation
311
+ β”‚ β”‚ β”œβ”€β”€ RepoKnowledgeGraph.py # Main graph class
312
+ β”‚ β”‚ β”œβ”€β”€ KnowledgeGraphMCPServer.py # MCP server implementation
313
+ β”‚ β”‚ β”œβ”€β”€ EntityExtractor.py # AST-based entity extraction
314
+ β”‚ β”‚ β”œβ”€β”€ CodeParser.py # Code chunking
315
+ β”‚ β”‚ β”œβ”€β”€ CodeIndex.py # Hybrid search (LanceDB/Weaviate)
316
+ β”‚ β”‚ β”œβ”€β”€ ModelService.py # Embedding generation
317
+ β”‚ β”‚ └── Node.py # Graph node types
318
+ β”‚ β”œβ”€β”€ run_mcp_server.py # Standalone MCP server
319
+ β”‚ └── tests/ # Test suite
320
+ └── docker-compose*.yml # Docker configurations
321
+ ```
322
+
323
 
324
 
325
 
 
330
 
331
  **Team Members:**
332
  - **Laila ELKOUSSY** - [@lailaelkoussy](https://huggingface.co/lailaelkoussy) - Research Engineer, Data Scientist
333
+ - **Julien PEREZ** - [@jnm38](https://huggingface.co/jnm38) - Research Director
334
+
335
+ ---
336
+
337
+ ## πŸ“„ License
338
+
339
+ This project is developed as part of research at EPITA / Ionis Group.
340
+
341
+ ## πŸ”— Related Resources
342
+
343
+ - [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - The protocol standard
344
+ - [FastMCP](https://github.com/jlowin/fastmcp) - Python MCP framework used
345
+ - [LanceDB](https://lancedb.github.io/lancedb/) - Vector database for code indexing
346
+ - [Salesforce SFR-Embedding-Code](https://huggingface.co/Salesforce/SFR-Embedding-Code-400M_R) - Code embedding model
347
+
348
+ ## πŸ†š VS Code Integration
349
+
350
+ To use this MCP server with **GitHub Copilot** in VS Code, you need to configure an `mcp.json` file.
351
+
352
+ ### Configuration File Location
353
+
354
+ Create or edit the file at `.vscode/mcp.json` in your workspace root:
355
+
356
+ ```
357
+ your-workspace/
358
+ β”œβ”€β”€ .vscode/
359
+ β”‚ └── mcp.json ← Place the configuration here
360
+ β”œβ”€β”€ src/
361
+ └── ...
362
+ ```
363
+
364
+ ### Configuration Content
365
+
366
+ Add the following content to `.vscode/mcp.json`:
367
+
368
+ ```jsonc
369
+ {
370
+ "servers": {
371
+ "transformers-code-graph": {
372
+ "url": "https://lailaelkoussy-transformers-library-knowledge-graph.hf.space/gradio_api/mcp/",
373
+ "type": "http"
374
+ }
375
+ },
376
+ "inputs": []
377
+ }
378
+ ```
379
+
380
+ ### What This Does
381
+
382
+ - **`servers`**: Defines the MCP servers available to VS Code
383
+ - **`transformers-code-graph`**: A custom name for this server connection
384
+ - **`url`**: The endpoint of the hosted MCP server (here pointing to the HuggingFace Space)
385
+ - **`type`**: Set to `"http"` for remote HTTP-based MCP servers
386
+
387
+ ### Using with Your Own Server
388
+
389
+ If you're running your own MCP server locally, update the URL accordingly:
390
+
391
+ ```jsonc
392
+ {
393
+ "servers": {
394
+ "my-code-graph": {
395
+ "url": "http://localhost:7860/gradio_api/mcp/",
396
+ "type": "http"
397
+ }
398
+ },
399
+ "inputs": []
400
+ }
401
+ ```
402
+
403
+ Once configured, GitHub Copilot in VS Code will have access to all the knowledge graph tools (search_nodes, go_to_definition, find_usages, etc.) to help navigate and understand your codebase.