Baction commited on
Commit
f8bff51
Β·
1 Parent(s): 786acc3

Updating app.py

Browse files
Files changed (3) hide show
  1. README.md +99 -0
  2. app.py +176 -0
  3. requirements.txt +9 -0
README.md CHANGED
@@ -10,3 +10,102 @@ pinned: false
10
  ---
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
+
14
+ # Vulnerability Scanner MCP Server
15
+
16
+ This repository contains a small Gradio-based MCP server that provides three tools to inspect GitHub repositories using the GitHub REST API:
17
+
18
+ - Get basic repository information
19
+ - Retrieve the decoded content of a single file
20
+ - Scan a repository for source code files with specific extensions (returns a list of file paths)
21
+
22
+ The application is implemented in `app.py` and exposes a Gradio TabbedInterface with three API endpoints: `get_repository_info`, `get_file_content`, and `scan_repository`.
23
+
24
+ ## Requirements
25
+
26
+ The project uses Python and the dependencies listed in `requirements.txt`. Key requirements used by the app are:
27
+
28
+ - gradio (with mcp support)
29
+ - requests
30
+ - python-dotenv
31
+
32
+ On Windows with PowerShell, create a virtual environment and install requirements:
33
+
34
+ ```powershell
35
+ python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install -r requirements.txt
36
+ ```
37
+
38
+ If you prefer cmd.exe use:
39
+
40
+ ```cmd
41
+ python -m venv .venv && .\.venv\Scripts\activate.bat && pip install -r requirements.txt
42
+ ```
43
+
44
+ ## Environment variables
45
+
46
+ The application requires a GitHub personal access token to access the GitHub API. Create a `.env` file in the project root with the following variable:
47
+
48
+ ```
49
+ GITHUB_TOKEN=ghp_...your_token_here
50
+ ```
51
+
52
+ The token must have permission to read the repositories you want to inspect (public repos are readable with a token that has no special scopes; private repos require appropriate scopes).
53
+
54
+ ## Running the server
55
+
56
+ Start the app directly with Python. The Gradio interface will launch and expose the MCP API endpoints:
57
+
58
+ ```powershell
59
+ python app.py
60
+ ```
61
+
62
+ When run, the app prints startup information and launches Gradio in MCP server mode. By default Gradio will open a local web UI and also expose the MCP API endpoints used by other MCP clients.
63
+
64
+ ## Endpoints and usage
65
+
66
+ The Gradio interface exposes three tools (functions) via the MCP server and also as interactive UI tabs:
67
+
68
+ 1) get_repository_info
69
+
70
+ Inputs:
71
+ - Repository Owner (e.g. `octocat`)
72
+ - Repository Name (e.g. `Hello-World`)
73
+
74
+ Output: A JSON-like text block containing repository metadata (name, full_name, description, primary_language, size_kb, stars, forks, default_branch, created_date, last_updated, is_private, clone_url).
75
+
76
+ 2) get_file_content
77
+
78
+ Inputs:
79
+ - Repository Owner
80
+ - Repository Name
81
+ - File Path (e.g. `README.md`)
82
+
83
+ Output: Decoded file contents as plain text. If the file is binary or cannot be decoded, an error message is returned.
84
+
85
+ 3) scan_repository
86
+
87
+ Inputs:
88
+ - Repository Owner
89
+ - Repository Name
90
+ - File Extensions (comma-separated, default: `.py,.js,.ts,.php,.java`)
91
+
92
+ Output: A list of file paths (limited to the first 50 matching files) found in the repository with the given extensions.
93
+
94
+ Notes and edge-cases:
95
+
96
+ - The code uses the GitHub REST API and will return HTTP error messages when repositories or files are not found or when the token lacks permissions.
97
+ - `get_file_content` decodes base64 file content returned by the GitHub API and will return an error string for binary files.
98
+ - `scan_repository` recursively traverses directories and limits results to avoid returning extremely large listings.
99
+
100
+ ## Example
101
+
102
+ After starting the server, open the Gradio UI. From the UI you can call the three tools. Example programmatic calls are possible via the Gradio API endpoints; however this README focuses on interactive use through the launched UI (the functions are registered with `api_name` values matching their function names).
103
+
104
+ ## Development notes
105
+
106
+ - The GitHub token is mandatory; the application raises ValueError if `GITHUB_TOKEN` is not present in the environment.
107
+ - The app limits recursion results to avoid deep scans; adjust `_scan_directory_sync` if you need more files or different limits.
108
+
109
+ ## License
110
+
111
+ This project does not include a license file in the repository. Add a LICENSE if you plan to publish or share this code.
app.py ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import base64
4
+ from typing import Dict, List, Any
5
+ import requests
6
+ import gradio as gr
7
+ from dotenv import load_dotenv
8
+
9
+ # Load environment variables
10
+ load_dotenv()
11
+
12
+ class GitHubMCPServer:
13
+ """GitHub MCP Server for repository scanning and file access"""
14
+
15
+ def __init__(self):
16
+ self.github_token = os.getenv("GITHUB_TOKEN")
17
+ if not self.github_token:
18
+ raise ValueError("GITHUB_TOKEN environment variable is required")
19
+
20
+ self.headers = {
21
+ "Authorization": f"token {self.github_token}",
22
+ "Accept": "application/vnd.github.v3+json"
23
+ }
24
+
25
+ def get_repository_info(self, owner: str, repo: str) -> dict:
26
+ """Get basic repository information"""
27
+ try:
28
+ url = f"https://api.github.com/repos/{owner}/{repo}"
29
+ response = requests.get(url, headers=self.headers)
30
+
31
+ if response.status_code == 200:
32
+ data = response.json()
33
+ return {
34
+ "success": True,
35
+ "repository_name": data["name"],
36
+ "full_name": data["full_name"],
37
+ "description": data.get("description", "No description available"),
38
+ "primary_language": data.get("language", "Unknown"),
39
+ "size_kb": data["size"],
40
+ "stars": data["stargazers_count"],
41
+ "forks": data["forks_count"],
42
+ "default_branch": data["default_branch"],
43
+ "created_date": data["created_at"][:10],
44
+ "last_updated": data["updated_at"][:10],
45
+ "is_private": data["private"],
46
+ "clone_url": data["clone_url"]
47
+ }
48
+ else:
49
+ return {
50
+ "success": False,
51
+ "error": f"Repository not found or inaccessible (HTTP {response.status_code})",
52
+ "status_code": response.status_code
53
+ }
54
+
55
+ except Exception as e:
56
+ return {
57
+ "success": False,
58
+ "error": f"Failed to fetch repository information: {str(e)}"
59
+ }
60
+
61
+ def get_file_content(self, owner: str, repo: str, path: str) -> str:
62
+ """Get content of a specific file - returns just the file content as string"""
63
+ try:
64
+ url = f"https://api.github.com/repos/{owner}/{repo}/contents/{path}"
65
+ response = requests.get(url, headers=self.headers)
66
+
67
+ if response.status_code == 200:
68
+ data = response.json()
69
+ if data["type"] == "file" and "content" in data:
70
+ # Decode base64 content
71
+ try:
72
+ content = base64.b64decode(data["content"]).decode('utf-8')
73
+ return content
74
+ except UnicodeDecodeError:
75
+ return f"ERROR: File '{path}' contains binary data that cannot be decoded as text"
76
+ else:
77
+ return f"ERROR: Path '{path}' is not a file or content is not available"
78
+ else:
79
+ return f"ERROR: File '{path}' not found or inaccessible (HTTP {response.status_code})"
80
+
81
+ except Exception as e:
82
+ return f"ERROR: Failed to fetch file content for '{path}': {str(e)}"
83
+
84
+ def scan_repository(self, owner: str, repo: str, extensions: str = ".py,.js,.ts,.php,.java") -> list:
85
+ """Scan repository for code files - returns simple list of file paths"""
86
+ try:
87
+ ext_list = [ext.strip() for ext in extensions.split(",") if ext.strip()]
88
+ all_files = []
89
+ self._scan_directory_sync(owner, repo, "", ext_list, all_files)
90
+
91
+ # Return simple list of file paths for easier processing by CodeAgent
92
+ file_paths = [file_info.get('path', '') for file_info in all_files[:50]] # Limit to 50 files
93
+ return file_paths
94
+
95
+ except Exception as e:
96
+ return [f"ERROR: Failed to scan repository: {str(e)}"]
97
+
98
+ def _scan_directory_sync(self, owner: str, repo: str, path: str, extensions: List[str], all_files: List[Dict]):
99
+ """Recursively scan directory for files"""
100
+ try:
101
+ url = f"https://api.github.com/repos/{owner}/{repo}/contents/{path}"
102
+ response = requests.get(url, headers=self.headers)
103
+
104
+ if response.status_code == 200:
105
+ data = response.json()
106
+ for item in data:
107
+ if item["type"] == "file":
108
+ if any(item["name"].endswith(ext) for ext in extensions):
109
+ all_files.append({
110
+ "name": item["name"],
111
+ "path": item["path"],
112
+ "type": item["type"],
113
+ "size": item.get("size", 0),
114
+ "sha": item["sha"]
115
+ })
116
+ elif item["type"] == "dir" and len(all_files) < 100:
117
+ self._scan_directory_sync(owner, repo, item["path"], extensions, all_files)
118
+ except Exception:
119
+ pass
120
+
121
+ # Initialize the GitHub MCP server
122
+ github_server = GitHubMCPServer()
123
+
124
+ # Create Gradio interfaces for MCP
125
+ demo = gr.TabbedInterface(
126
+ [
127
+ gr.Interface(
128
+ fn=github_server.get_repository_info,
129
+ inputs=[
130
+ gr.Textbox(label="Repository Owner", placeholder="octocat"),
131
+ gr.Textbox(label="Repository Name", placeholder="Hello-World")
132
+ ],
133
+ outputs=gr.Textbox(label="Repository Information", lines=15),
134
+ title="Get Repository Information",
135
+ description="Get basic information about a GitHub repository",
136
+ api_name="get_repository_info"
137
+ ),
138
+ gr.Interface(
139
+ fn=github_server.get_file_content,
140
+ inputs=[
141
+ gr.Textbox(label="Repository Owner", placeholder="octocat"),
142
+ gr.Textbox(label="Repository Name", placeholder="Hello-World"),
143
+ gr.Textbox(label="File Path", placeholder="README.md")
144
+ ],
145
+ outputs=gr.Textbox(label="File Content", lines=20),
146
+ title="Get File Content",
147
+ description="Get the content of a specific file from a GitHub repository",
148
+ api_name="get_file_content"
149
+ ),
150
+ gr.Interface(
151
+ fn=github_server.scan_repository,
152
+ inputs=[
153
+ gr.Textbox(label="Repository Owner", placeholder="octocat"),
154
+ gr.Textbox(label="Repository Name", placeholder="Hello-World"),
155
+ gr.Textbox(label="File Extensions", value=".py,.js,.ts,.php,.java", placeholder=".py,.js,.ts,.php,.java")
156
+ ],
157
+ outputs=gr.Textbox(label="Scan Results", lines=20),
158
+ title="Scan Repository for Code Files",
159
+ description="Scan a GitHub repository for code files with specified extensions",
160
+ api_name="scan_repository"
161
+ )
162
+ ],
163
+ [
164
+ "Repository Info",
165
+ "File Content",
166
+ "Repository Scanner"
167
+ ],
168
+ title="πŸ™ GitHub MCP Server"
169
+ )
170
+
171
+ if __name__ == "__main__":
172
+ print("πŸš€ Starting GitHub MCP Server with Gradio...")
173
+ print("πŸ“‘ Server will provide GitHub repository access via MCP")
174
+ print("πŸ› οΈ Available tools: repository info, file content, repository scanner")
175
+
176
+ demo.launch(mcp_server=True)
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio[oauth,mcp]==5.45.0
2
+ fastapi==0.115.2
3
+ uvicorn==0.24.0
4
+ mcp==1.10.1
5
+ smolagents>=0.1.0
6
+ requests>=2.28.0
7
+ python-dotenv>=1.0.0
8
+ pydantic>=2.11,<2.12
9
+ smolagents[mcp]>=0.1.0