Update README.md
Browse files
README.md
CHANGED
|
@@ -92,7 +92,7 @@ This project addresses these challenges by:
|
|
| 92 |
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
|
| 93 |
βΌ
|
| 94 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 95 |
-
β MCP SERVER (
|
| 96 |
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
|
| 97 |
β βsearch_nodes β βgo_to_def β βfind_usages β βget_neighborsβ β
|
| 98 |
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
|
|
@@ -161,10 +161,6 @@ The system also detects **API endpoints** for web frameworks (FastAPI, Flask, Sp
|
|
| 161 |
### Quick Start with Docker
|
| 162 |
|
| 163 |
```bash
|
| 164 |
-
# Clone the repository
|
| 165 |
-
git clone https://github.com/lailanelkoussy/mcp-first-birthday.git
|
| 166 |
-
cd mcp-first-birthday
|
| 167 |
-
|
| 168 |
# Start the MCP server with a sample knowledge graph
|
| 169 |
docker-compose up
|
| 170 |
```
|
|
@@ -172,7 +168,7 @@ docker-compose up
|
|
| 172 |
### Building a Knowledge Graph from Your Repository
|
| 173 |
|
| 174 |
```python
|
| 175 |
-
from
|
| 176 |
|
| 177 |
# From a local path
|
| 178 |
kg = RepoKnowledgeGraph.from_path(
|
|
@@ -186,14 +182,11 @@ kg = RepoKnowledgeGraph.from_path(
|
|
| 186 |
kg.save_graph_to_file("my_knowledge_graph.json")
|
| 187 |
```
|
| 188 |
|
| 189 |
-
### Running the MCP
|
| 190 |
|
| 191 |
```bash
|
| 192 |
-
# Using the Gradio interface (recommended for exploration)
|
| 193 |
python gradio_mcp.py --graph-file my_knowledge_graph.json --host 0.0.0.0 --port 7860
|
| 194 |
|
| 195 |
-
# Or directly as an MCP server
|
| 196 |
-
python pedagogia_graph_code_repo/run_mcp_server.py --graph-file my_knowledge_graph.json
|
| 197 |
```
|
| 198 |
|
| 199 |
## π Interactive Explorer (Gradio UI)
|
|
@@ -211,7 +204,7 @@ The application supports loading knowledge graphs from multiple sources:
|
|
| 211 |
|
| 212 |
### 1. HuggingFace Hub Dataset (Recommended for Sharing)
|
| 213 |
|
| 214 |
-
Load directly from a HuggingFace dataset:
|
| 215 |
|
| 216 |
```bash
|
| 217 |
python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
|
|
@@ -250,74 +243,22 @@ kg.to_hf_dataset("username/my-knowledge-graph", save_embeddings=False, private=F
|
|
| 250 |
kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
|
| 251 |
```
|
| 252 |
|
| 253 |
-
## π³ Docker Configuration
|
| 254 |
-
|
| 255 |
-
The default Dockerfile uses a local JSON file. To use HuggingFace datasets instead, modify the CMD line in `Dockerfile`:
|
| 256 |
-
|
| 257 |
-
```dockerfile
|
| 258 |
-
# Using HuggingFace dataset (recommended for smaller Docker image)
|
| 259 |
-
CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--hf-dataset", "username/dataset-name"]
|
| 260 |
-
|
| 261 |
-
# Using local file (requires large data file in image)
|
| 262 |
-
CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--graph-file", "/app/data/multihop_knowledge_graph_with_embeddings.json"]
|
| 263 |
-
```
|
| 264 |
-
|
| 265 |
-
## π» Local Development
|
| 266 |
-
|
| 267 |
-
To run locally:
|
| 268 |
-
|
| 269 |
-
```bash
|
| 270 |
-
docker build -t gradio-mcp-space .
|
| 271 |
-
docker run -p 7860:7860 gradio-mcp-space
|
| 272 |
-
```
|
| 273 |
-
|
| 274 |
-
Or without Docker:
|
| 275 |
-
|
| 276 |
-
```bash
|
| 277 |
-
pip install -r requirements.txt
|
| 278 |
-
python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
|
| 279 |
-
```
|
| 280 |
-
|
| 281 |
-
## βοΈ Deployment to HuggingFace Spaces
|
| 282 |
-
|
| 283 |
-
### Option 1: Using HuggingFace Dataset (Recommended)
|
| 284 |
-
|
| 285 |
-
1. First, push your knowledge graph to a HuggingFace dataset
|
| 286 |
-
2. Update the Dockerfile CMD to use `--hf-dataset`
|
| 287 |
-
3. Push to the Space repository (no large files needed)
|
| 288 |
-
|
| 289 |
-
### Option 2: Using Local JSON File
|
| 290 |
-
|
| 291 |
-
1. Create a new Space on HuggingFace with Docker SDK
|
| 292 |
-
2. Enable Git LFS in your Space repository
|
| 293 |
-
3. Push this directory to the Space repository:
|
| 294 |
-
```bash
|
| 295 |
-
git lfs install
|
| 296 |
-
git lfs track "data/*.json"
|
| 297 |
-
git add .
|
| 298 |
-
git commit -m "Initial commit"
|
| 299 |
-
git push
|
| 300 |
-
```
|
| 301 |
|
| 302 |
## ποΈ Architecture Overview
|
| 303 |
|
| 304 |
```
|
| 305 |
-
|
| 306 |
-
βββ gradio_mcp.py # Main Gradio web interface
|
| 307 |
βββ Dockerfile # Docker configuration
|
| 308 |
βββ requirements.txt # Python dependencies
|
| 309 |
-
βββ
|
| 310 |
-
β βββ
|
| 311 |
-
β
|
| 312 |
-
β
|
| 313 |
-
β
|
| 314 |
-
β
|
| 315 |
-
β
|
| 316 |
-
β
|
| 317 |
-
|
| 318 |
-
β βββ run_mcp_server.py # Standalone MCP server
|
| 319 |
-
β βββ tests/ # Test suite
|
| 320 |
-
βββ docker-compose*.yml # Docker configurations
|
| 321 |
```
|
| 322 |
|
| 323 |
|
|
@@ -341,7 +282,7 @@ This project is developed as part of research at EPITA / Ionis Group.
|
|
| 341 |
## π Related Resources
|
| 342 |
|
| 343 |
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - The protocol standard
|
| 344 |
-
- [
|
| 345 |
- [LanceDB](https://lancedb.github.io/lancedb/) - Vector database for code indexing
|
| 346 |
- [Salesforce SFR-Embedding-Code](https://huggingface.co/Salesforce/SFR-Embedding-Code-400M_R) - Code embedding model
|
| 347 |
|
|
|
|
| 92 |
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
|
| 93 |
βΌ
|
| 94 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 95 |
+
β MCP SERVER (Gradio) β
|
| 96 |
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
|
| 97 |
β βsearch_nodes β βgo_to_def β βfind_usages β βget_neighborsβ β
|
| 98 |
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββ β
|
|
|
|
| 161 |
### Quick Start with Docker
|
| 162 |
|
| 163 |
```bash
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
# Start the MCP server with a sample knowledge graph
|
| 165 |
docker-compose up
|
| 166 |
```
|
|
|
|
| 168 |
### Building a Knowledge Graph from Your Repository
|
| 169 |
|
| 170 |
```python
|
| 171 |
+
from RepoKnowledgeGraphLib.RepoKnowledgeGraph import RepoKnowledgeGraph
|
| 172 |
|
| 173 |
# From a local path
|
| 174 |
kg = RepoKnowledgeGraph.from_path(
|
|
|
|
| 182 |
kg.save_graph_to_file("my_knowledge_graph.json")
|
| 183 |
```
|
| 184 |
|
| 185 |
+
### Running the MCP using Gradio
|
| 186 |
|
| 187 |
```bash
|
|
|
|
| 188 |
python gradio_mcp.py --graph-file my_knowledge_graph.json --host 0.0.0.0 --port 7860
|
| 189 |
|
|
|
|
|
|
|
| 190 |
```
|
| 191 |
|
| 192 |
## π Interactive Explorer (Gradio UI)
|
|
|
|
| 204 |
|
| 205 |
### 1. HuggingFace Hub Dataset (Recommended for Sharing)
|
| 206 |
|
| 207 |
+
Load directly from a HuggingFace dataset created by the library (cf. Publishing to Huggingface Hub):
|
| 208 |
|
| 209 |
```bash
|
| 210 |
python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
|
|
|
|
| 243 |
kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
|
| 244 |
```
|
| 245 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 246 |
|
| 247 |
## ποΈ Architecture Overview
|
| 248 |
|
| 249 |
```
|
| 250 |
+
root/
|
|
|
|
| 251 |
βββ Dockerfile # Docker configuration
|
| 252 |
βββ requirements.txt # Python dependencies
|
| 253 |
+
βββ RepoKnowledgeGraphLib/ # Knowledge graph implementation
|
| 254 |
+
β βββ RepoKnowledgeGraph.py # Main graph class
|
| 255 |
+
β βββ KnowledgeGraphMCPServer.py # MCP server implementation
|
| 256 |
+
β βββ EntityExtractor.py # AST-based entity extraction
|
| 257 |
+
β βββ CodeParser.py # Code chunking
|
| 258 |
+
β βββ CodeIndex.py # Hybrid search (LanceDB/Weaviate)
|
| 259 |
+
β βββ ModelService.py # Embedding generation
|
| 260 |
+
β βββ Node.py # Graph node types
|
| 261 |
+
βββ gradio_mcp_space.py # Main Gradio web interface
|
|
|
|
|
|
|
|
|
|
| 262 |
```
|
| 263 |
|
| 264 |
|
|
|
|
| 282 |
## π Related Resources
|
| 283 |
|
| 284 |
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - The protocol standard
|
| 285 |
+
- [Gradio](https://gradio.app/) - Python web interface framework with MCP support
|
| 286 |
- [LanceDB](https://lancedb.github.io/lancedb/) - Vector database for code indexing
|
| 287 |
- [Salesforce SFR-Embedding-Code](https://huggingface.co/Salesforce/SFR-Embedding-Code-400M_R) - Code embedding model
|
| 288 |
|