lailaelkoussy commited on
Commit
922b88f
Β·
verified Β·
1 Parent(s): 2a3b71b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -4
README.md CHANGED
@@ -1,10 +1,135 @@
1
  ---
2
  title: Transformers Library Knowledge Graph
3
- emoji: 🐒
4
- colorFrom: purple
5
- colorTo: pink
6
  sdk: docker
 
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Transformers Library Knowledge Graph
3
+ emoji: πŸ”
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
9
  ---
10
 
11
+ # Knowledge Graph MCP Explorer
12
+
13
+ This is a Gradio-based interactive tool for exploring code repository knowledge graphs. It provides a web interface to search, navigate, and analyze code relationships using the Model Context Protocol (MCP).
14
+
15
+ ## Features
16
+
17
+ - **Search Nodes**: Search for code entities, functions, classes, and more using semantic search
18
+ - **Graph Navigation**: Explore relationships between code elements
19
+ - **Entity Tracking**: View declared and called entities within code chunks
20
+ - **Path Finding**: Find paths between different nodes in the knowledge graph
21
+ - **Subgraph Extraction**: Extract and visualize subgraphs around specific nodes
22
+ - **File Structure**: View the hierarchical structure of the repository
23
+
24
+ ## Usage
25
+
26
+ The application loads a pre-built knowledge graph from the HuggingFace Transformers repository. You can:
27
+
28
+ 1. **Search**: Use the search tab to find relevant code snippets and entities
29
+ 2. **Explore**: Navigate through the graph using node IDs
30
+ 3. **Analyze**: Get statistics about the code structure and relationships
31
+
32
+ ## Technical Details
33
+
34
+ - Built with Gradio for the web interface
35
+ - Uses LanceDB for efficient code indexing and search
36
+ - Supports hybrid search (keyword + semantic embeddings)
37
+ - Pre-computed embeddings using Salesforce/SFR-Embedding-Code-400M_R model
38
+
39
+ ## Data Sources
40
+
41
+ The application supports loading knowledge graphs from:
42
+
43
+ ### 1. HuggingFace Hub Dataset (Recommended)
44
+
45
+ Load directly from a HuggingFace dataset:
46
+
47
+ ```bash
48
+ python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
49
+ ```
50
+
51
+ ### 2. Local JSON File
52
+
53
+ Use a local JSON file (e.g., `multihop_knowledge_graph_with_embeddings.json`):
54
+
55
+ ```bash
56
+ python gradio_mcp.py --host 0.0.0.0 --port 7860 --graph-file data/multihop_knowledge_graph_with_embeddings.json
57
+ ```
58
+
59
+ ### Creating and Publishing a Dataset
60
+
61
+ You can save an existing knowledge graph to HuggingFace Hub:
62
+
63
+ ```python
64
+ from RepoKnowledgeGraphLib import RepoKnowledgeGraph
65
+
66
+ # Load from local file
67
+ kg = RepoKnowledgeGraph.load("path/to/graph.json")
68
+
69
+ # Push to HuggingFace Hub (without embeddings to reduce size)
70
+ kg.to_hf_dataset("username/my-knowledge-graph", save_embeddings=False, private=False)
71
+
72
+ # Or with embeddings (larger dataset)
73
+ kg.to_hf_dataset("username/my-knowledge-graph-with-embeddings", save_embeddings=True)
74
+ ```
75
+
76
+ ## Docker Configuration
77
+
78
+ The default Dockerfile uses a local JSON file. To use HuggingFace datasets instead, modify the CMD line in `Dockerfile`:
79
+
80
+ ```dockerfile
81
+ # Using HuggingFace dataset (recommended for smaller Docker image)
82
+ CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--hf-dataset", "username/dataset-name"]
83
+
84
+ # Using local file (requires large data file in image)
85
+ CMD ["python", "-u", "gradio_mcp.py", "--host", "0.0.0.0", "--port", "7860", "--graph-file", "/app/data/multihop_knowledge_graph_with_embeddings.json"]
86
+ ```
87
+
88
+ ## Local Development
89
+
90
+ To run locally:
91
+
92
+ ```bash
93
+ docker build -t gradio-mcp-space .
94
+ docker run -p 7860:7860 gradio-mcp-space
95
+ ```
96
+
97
+ Or without Docker:
98
+
99
+ ```bash
100
+ pip install -r requirements.txt
101
+ python gradio_mcp.py --host 0.0.0.0 --port 7860 --hf-dataset "username/dataset-name"
102
+ ```
103
+
104
+ ## Deployment to HuggingFace Spaces
105
+
106
+ ### Option 1: Using HuggingFace Dataset (Recommended)
107
+
108
+ 1. First, push your knowledge graph to a HuggingFace dataset
109
+ 2. Update the Dockerfile CMD to use `--hf-dataset`
110
+ 3. Push to the Space repository (no large files needed)
111
+
112
+ ### Option 2: Using Local JSON File
113
+
114
+ 1. Create a new Space on HuggingFace with Docker SDK
115
+ 2. Enable Git LFS in your Space repository
116
+ 3. Push this directory to the Space repository:
117
+ ```bash
118
+ git lfs install
119
+ git lfs track "data/*.json"
120
+ git add .
121
+ git commit -m "Initial commit"
122
+ git push
123
+ ```
124
+
125
+
126
+
127
+
128
+
129
+ ## πŸ‘₯ Team
130
+
131
+ **Team Name:** CEPIA Ionis Team
132
+
133
+ **Team Members:**
134
+ - **Laila ELKOUSSY** - [@lailaelkoussy](https://huggingface.co/lailaelkoussy) - Research Engineer, Data Scientist
135
+ - **Julien PEREZ** - [@jnm38](https://huggingface.co/jnm38) -