lailaelkoussy commited on
Commit
f62ca2a
Β·
verified Β·
1 Parent(s): 42ad0cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -74
README.md CHANGED
@@ -92,7 +92,7 @@ This project addresses these challenges by:
92
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
93
  β–Ό
94
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
95
- β”‚ MCP SERVER (FastMCP) β”‚
96
  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
97
  β”‚ β”‚search_nodes β”‚ β”‚go_to_def β”‚ β”‚find_usages β”‚ β”‚get_neighborsβ”‚ β”‚
98
  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
@@ -161,10 +161,6 @@ The system also detects **API endpoints** for web frameworks (FastAPI, Flask, Sp
161
  ### Quick Start with Docker
162
 
163
  ```bash
164
- # Clone the repository
165
- git clone https://github.com/lailanelkoussy/mcp-first-birthday.git
166
- cd mcp-first-birthday
167
-
168
  # Start the MCP server with a sample knowledge graph
169
  docker-compose up
170
  ```
@@ -172,7 +168,7 @@ docker-compose up
172
  ### Building a Knowledge Graph from Your Repository
173
 
174
  ```python
175
- from pedagogia_graph_code_repo.RepoKnowledgeGraphLib import RepoKnowledgeGraph
176
 
177
  # From a local path
178
  kg = RepoKnowledgeGraph.from_path(
@@ -186,14 +182,11 @@ kg = RepoKnowledgeGraph.from_path(
186
  kg.save_graph_to_file("my_knowledge_graph.json")
187
  ```
188
 
189
- ### Running the MCP Server
190
 
191
  ```bash
192
- # Using the Gradio interface (recommended for exploration)
193
  python gradio_mcp.py --graph-file my_knowledge_graph.json --host 0.0.0.0 --port 7860
194
 
195
- # Or directly as an MCP server
196
- python pedagogia_graph_code_repo/run_mcp_server.py --graph-file my_knowledge_graph.json
197
  ```
198
 
199
  ## πŸ“Š Interactive Explorer (Gradio UI)
@@ -211,7 +204,7 @@ The application supports loading knowledge graphs from multiple sources:
211
 
212
  ### 1. HuggingFace Hub Dataset (Recommended for Sharing)
213
 
214
- Load directly from a HuggingFace dataset:
215
 
216
  ```bash
217
  python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
@@ -250,74 +243,22 @@ kg.to_hf_dataset("username/my-knowledge-graph", save_embeddings=False, private=F
250
  kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
251
  ```
252
 
253
- ## 🐳 Docker Configuration
254
-
255
- The default Dockerfile uses a local JSON file. To use HuggingFace datasets instead, modify the CMD line in `Dockerfile`:
256
-
257
- ```dockerfile
258
- # Using HuggingFace dataset (recommended for smaller Docker image)
259
- CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--hf-dataset", "username/dataset-name"]
260
-
261
- # Using local file (requires large data file in image)
262
- CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--graph-file", "/app/data/multihop_knowledge_graph_with_embeddings.json"]
263
- ```
264
-
265
- ## πŸ’» Local Development
266
-
267
- To run locally:
268
-
269
- ```bash
270
- docker build -t gradio-mcp-space .
271
- docker run -p 7860:7860 gradio-mcp-space
272
- ```
273
-
274
- Or without Docker:
275
-
276
- ```bash
277
- pip install -r requirements.txt
278
- python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
279
- ```
280
-
281
- ## ☁️ Deployment to HuggingFace Spaces
282
-
283
- ### Option 1: Using HuggingFace Dataset (Recommended)
284
-
285
- 1. First, push your knowledge graph to a HuggingFace dataset
286
- 2. Update the Dockerfile CMD to use `--hf-dataset`
287
- 3. Push to the Space repository (no large files needed)
288
-
289
- ### Option 2: Using Local JSON File
290
-
291
- 1. Create a new Space on HuggingFace with Docker SDK
292
- 2. Enable Git LFS in your Space repository
293
- 3. Push this directory to the Space repository:
294
- ```bash
295
- git lfs install
296
- git lfs track "data/*.json"
297
- git add .
298
- git commit -m "Initial commit"
299
- git push
300
- ```
301
 
302
  ## πŸ—οΈ Architecture Overview
303
 
304
  ```
305
- mcp-first-birthday/
306
- β”œβ”€β”€ gradio_mcp.py # Main Gradio web interface
307
  β”œβ”€β”€ Dockerfile # Docker configuration
308
  β”œβ”€β”€ requirements.txt # Python dependencies
309
- β”œβ”€β”€ pedagogia_graph_code_repo/ # Core library
310
- β”‚ β”œβ”€β”€ RepoKnowledgeGraphLib/ # Knowledge graph implementation
311
- β”‚ β”‚ β”œβ”€β”€ RepoKnowledgeGraph.py # Main graph class
312
- β”‚ β”‚ β”œβ”€β”€ KnowledgeGraphMCPServer.py # MCP server implementation
313
- β”‚ β”‚ β”œβ”€β”€ EntityExtractor.py # AST-based entity extraction
314
- β”‚ β”‚ β”œβ”€β”€ CodeParser.py # Code chunking
315
- β”‚ β”‚ β”œβ”€β”€ CodeIndex.py # Hybrid search (LanceDB/Weaviate)
316
- β”‚ β”‚ ���── ModelService.py # Embedding generation
317
- β”‚ β”‚ └── Node.py # Graph node types
318
- β”‚ β”œβ”€β”€ run_mcp_server.py # Standalone MCP server
319
- β”‚ └── tests/ # Test suite
320
- └── docker-compose*.yml # Docker configurations
321
  ```
322
 
323
 
@@ -341,7 +282,7 @@ This project is developed as part of research at EPITA / Ionis Group.
341
  ## πŸ”— Related Resources
342
 
343
  - [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - The protocol standard
344
- - [FastMCP](https://github.com/jlowin/fastmcp) - Python MCP framework used
345
  - [LanceDB](https://lancedb.github.io/lancedb/) - Vector database for code indexing
346
  - [Salesforce SFR-Embedding-Code](https://huggingface.co/Salesforce/SFR-Embedding-Code-400M_R) - Code embedding model
347
 
 
92
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
93
  β–Ό
94
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
95
+ β”‚ MCP SERVER (Gradio) β”‚
96
  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
97
  β”‚ β”‚search_nodes β”‚ β”‚go_to_def β”‚ β”‚find_usages β”‚ β”‚get_neighborsβ”‚ β”‚
98
  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
 
161
  ### Quick Start with Docker
162
 
163
  ```bash
 
 
 
 
164
  # Start the MCP server with a sample knowledge graph
165
  docker-compose up
166
  ```
 
168
  ### Building a Knowledge Graph from Your Repository
169
 
170
  ```python
171
+ from RepoKnowledgeGraphLib.RepoKnowledgeGraph import RepoKnowledgeGraph
172
 
173
  # From a local path
174
  kg = RepoKnowledgeGraph.from_path(
 
182
  kg.save_graph_to_file("my_knowledge_graph.json")
183
  ```
184
 
185
+ ### Running the MCP using Gradio
186
 
187
  ```bash
 
188
  python gradio_mcp.py --graph-file my_knowledge_graph.json --host 0.0.0.0 --port 7860
189
 
 
 
190
  ```
191
 
192
  ## πŸ“Š Interactive Explorer (Gradio UI)
 
204
 
205
  ### 1. HuggingFace Hub Dataset (Recommended for Sharing)
206
 
207
+ Load directly from a HuggingFace dataset created by the library (cf. Publishing to Huggingface Hub):
208
 
209
  ```bash
210
  python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
 
243
  kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
244
  ```
245
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
246
 
247
  ## πŸ—οΈ Architecture Overview
248
 
249
  ```
250
+ root/
 
251
  β”œβ”€β”€ Dockerfile # Docker configuration
252
  β”œβ”€β”€ requirements.txt # Python dependencies
253
+ β”œβ”€β”€ RepoKnowledgeGraphLib/ # Knowledge graph implementation
254
+ β”‚ β”œβ”€β”€ RepoKnowledgeGraph.py # Main graph class
255
+ β”‚ β”œβ”€β”€ KnowledgeGraphMCPServer.py # MCP server implementation
256
+ β”‚ β”œβ”€β”€ EntityExtractor.py # AST-based entity extraction
257
+ β”‚ β”œβ”€β”€ CodeParser.py # Code chunking
258
+ β”‚ β”œβ”€β”€ CodeIndex.py # Hybrid search (LanceDB/Weaviate)
259
+ β”‚ β”œβ”€β”€ ModelService.py # Embedding generation
260
+ β”‚ └── Node.py # Graph node types
261
+ └── gradio_mcp_space.py # Main Gradio web interface
 
 
 
262
  ```
263
 
264
 
 
282
  ## πŸ”— Related Resources
283
 
284
  - [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - The protocol standard
285
+ - [Gradio](https://gradio.app/) - Python web interface framework with MCP support
286
  - [LanceDB](https://lancedb.github.io/lancedb/) - Vector database for code indexing
287
  - [Salesforce SFR-Embedding-Code](https://huggingface.co/Salesforce/SFR-Embedding-Code-400M_R) - Code embedding model
288