Spaces:

HimanshuGoyal2004
/

Vulnerability-Scanner

Sleeping

App Files Files Community

HimanshuGoyal2004 commited on Oct 9, 2025

Commit

ffcbb95

1 Parent(s): 55cc839

agentic rag test

Browse files

Files changed (2) hide show

app.py +75 -43
requirements.txt +5 -0

app.py CHANGED Viewed

@@ -31,7 +31,6 @@ def analyze_vulnerabilities(message, history, hf_token):
         return "❌ Please provide a Hugging Face API key. Get one from [Hugging Face](https://huggingface.co/settings/tokens)"
     try:
-        # Connect to MCP server
         mcp_client = MCPClient({
             "url": MCP_SERVER_URL,
             "timeout": 120
@@ -41,12 +40,12 @@ def analyze_vulnerabilities(message, history, hf_token):
         # Initialize AI model with user's token
         model = InferenceClientModel(token=hf_token.strip())
-        # Create AI agent with GitHub MCP tools
         agent = CodeAgent(
-            tools=[*tools],
             model=model,
             additional_authorized_imports=["json", "ast", "urllib", "base64", "re"],
-            max_steps=10
         )
         # Parse the GitHub URL
@@ -58,54 +57,86 @@ def analyze_vulnerabilities(message, history, hf_token):
         # Generate different prompts based on whether it's a file or repository
         if file_path:
             enhanced_prompt = f"""
-You are a cybersecurity expert. Analyze the specific GitHub file for security vulnerabilities.
 GitHub URL: {message}
 Repository: {owner}/{repo}
 File Path: {file_path}
-Please:
-1. First, get repository information to verify it exists
-2. Get the content of the specific file: {file_path}
-3. Analyze the file content line by line for security vulnerabilities
-4. Look for these security issues:
    - Command injection: os.system, exec, eval calls
-   - Input validation: unvalidated user inputs
    - Error handling: unhandled exceptions that could leak info
-   - Hardcoded secrets: API keys, passwords, tokens
    - Unsafe operations: file operations without validation
-5. Create a professional security report with:
-   - 🔍 File Overview (path, language, size)
-   - 📊 Vulnerability Summary (counts by severity)
-   - 🚨 Detailed Findings (line numbers, code snippets, impacts, fixes)
-Use simple string operations and avoid complex regex patterns. Focus on clear, actionable security findings.
 """
         else:
             enhanced_prompt = f"""
-You are a cybersecurity expert. Analyze the GitHub repository for security vulnerabilities.
 Repository: {message}
-Please:
-1. First, get repository information to verify it exists
-2. Scan the repository for code files (.py, .js, .ts, .php, .java, .cpp, .c, .cs, .go, .rb, .rs, .swift, .kt, .scala, .sh, .bash, .ps1, .ipynb, .sql, .xml, .yaml, .yml, .json, .config, .ini, .env)
-3. For the first 5-10 most important code files, get their content and analyze for security issues
-4. Look for these security vulnerabilities:
-   - Command injection: os.system, exec, eval calls
-   - Input validation: unvalidated user inputs, missing parameter checks
-   - Error handling: unhandled exceptions, information disclosure
-   - Hardcoded secrets: API keys, passwords, database credentials
-   - Unsafe operations: file operations, deserialization without validation
-5. Generate a comprehensive security report with:
-   - 🔍 Repository Overview
-   - 📁 Files Analyzed
-   - 📊 Vulnerability Summary (counts by severity)
-   - 🚨 Detailed Findings (file paths, line numbers, code snippets, impacts, remediation)
-Use simple string operations and focus on the most critical security issues. Limit analysis to prevent timeouts.
 """
         # Run the AI agent analysis
@@ -129,11 +160,12 @@ with gr.Blocks(theme=gr.themes.Soft(primary_hue="blue")) as demo:
     This intelligent vulnerability scanner leverages cutting-edge AI agents and Model Context Protocol (MCP) tools to perform comprehensive security analysis of GitHub repositories and individual files.
     **Key Features:**
-    -  **Deep Code Analysis**: Scans for common security vulnerabilities including SQL injection, XSS, command injection, and more
-    -  **AI-Powered Detection**: Uses advanced language models to understand code context and identify complex security issues
-    -  **Repository & File Support**: Analyze entire repositories or focus on specific files
-    -  **Detailed Reports**: Get comprehensive security reports with severity levels, line numbers, and remediation suggestions
-    -  **Secure Processing**: Your API keys are used securely and never stored
     **Project Links:**
     - 📂 **Source Code**: [GitHub Repository](https://github.com/banno-0720/vulnerability-scanner)

         return "❌ Please provide a Hugging Face API key. Get one from [Hugging Face](https://huggingface.co/settings/tokens)"
     try:
         mcp_client = MCPClient({
             "url": MCP_SERVER_URL,
             "timeout": 120
         # Initialize AI model with user's token
         model = InferenceClientModel(token=hf_token.strip())
+        # Create AI agent with GitHub MCP tools and CVE database
         agent = CodeAgent(
+            tools=tools,
             model=model,
             additional_authorized_imports=["json", "ast", "urllib", "base64", "re"],
+            max_steps=12
         )
         # Parse the GitHub URL
         # Generate different prompts based on whether it's a file or repository
         if file_path:
             enhanced_prompt = f"""
+You are a cybersecurity expert with access to a comprehensive CVE knowledge base. Analyze the specific GitHub file for security vulnerabilities.
 GitHub URL: {message}
 Repository: {owner}/{repo}
 File Path: {file_path}
+Please follow this enhanced analysis workflow:
+1. **Repository & File Analysis**:
+   - Get repository information to verify it exists
+   - Get the content of the specific file: {file_path}
+   - Identify the programming language and framework used
+2. **CVE Knowledge Base Research**:
+   - Use the search_cve_database tool to search for relevant vulnerability patterns based on the code you find
+   - Search for common weaknesses related to the programming language/framework
+   - Look up specific vulnerability types you identify in the code
+3. **Comprehensive Security Analysis**:
    - Command injection: os.system, exec, eval calls
+   - Input validation: unvalidated user inputs, missing sanitization
    - Error handling: unhandled exceptions that could leak info
+   - Hardcoded secrets: API keys, passwords, tokens, database credentials
    - Unsafe operations: file operations without validation
+   - Authentication/authorization flaws
+   - Cross-site scripting (XSS) vulnerabilities
+   - SQL injection vulnerabilities
+4. **Enhanced Security Report**:
+   - 🔍 **File Overview** (path, language, size, framework)
+   - 📊 **Vulnerability Summary** (counts by severity with CWE mappings)
+   - 🚨 **Detailed Findings** with:
+     - Line numbers and code snippets
+     - **CWE Classification** from CVE knowledge base
+     - **CVSS Severity** based on similar CVEs
+     - Security impact and exploitation scenarios
+     - **Remediation advice** with best practices
+     - **Related CVE examples** from knowledge base
+Use the search_cve_database tool extensively to provide context-aware analysis based on real-world vulnerability data.
 """
         else:
             enhanced_prompt = f"""
+You are a cybersecurity expert with access to a comprehensive CVE knowledge base. Analyze the GitHub repository for security vulnerabilities.
 Repository: {message}
+Please follow this enhanced analysis workflow:
+1. **Repository Discovery**:
+   - Get repository information to verify it exists and understand the tech stack
+   - Scan for code files (.py, .js, .ts, .php, .java, .cpp, .c, .cs, .go, .rb, .rs, .swift, .kt, .scala, .sh, .bash, .ps1, .ipynb, .sql, .xml, .yaml, .yml, .json, .config, .ini, .env)
+   - Prioritize the most critical files (main application files, configuration files, database schemas)
+2. **CVE Knowledge Base Research**:
+   - Use the search_cve_database tool to research common vulnerabilities for the identified tech stack
+   - Search for framework-specific vulnerabilities (e.g., "Django SQL injection", "React XSS", "Node.js command injection")
+   - Look up configuration-related vulnerabilities for the technologies used
+3. **Comprehensive Security Analysis** (analyze 5-8 most important files):
+   - **Injection Vulnerabilities**: SQL injection, command injection, code injection
+   - **Input Validation**: Unvalidated inputs, missing sanitization, parameter tampering
+   - **Authentication & Authorization**: Broken access controls, session management
+   - **Data Exposure**: Hardcoded secrets, information disclosure, insecure storage
+   - **Configuration Issues**: Debug mode, insecure defaults, missing security headers
+   - **Framework-Specific**: Technology-specific vulnerability patterns from CVE database
+4. **Enhanced Security Report**:
+   - 🔍 **Repository Overview** (tech stack, architecture, security posture)
+   - 📁 **Files Analyzed** (prioritized list with rationale)
+   - 📊 **Vulnerability Summary** with CWE classifications and CVSS scores
+   - 🚨 **Detailed Findings** including:
+     - File paths and line numbers
+     - **CWE Classification** from CVE knowledge base
+     - **Severity Assessment** based on CVSS scores from similar CVEs
+     - Code snippets and exploitation scenarios
+     - **Remediation Strategies** with best practices
+     - **Related CVE References** for context
+Use the search_cve_database tool extensively to provide evidence-based analysis grounded in real-world vulnerability data.
 """
         # Run the AI agent analysis
     This intelligent vulnerability scanner leverages cutting-edge AI agents and Model Context Protocol (MCP) tools to perform comprehensive security analysis of GitHub repositories and individual files.
     **Key Features:**
+    -  **🤖 AI-Powered Analysis**: Uses advanced language models with agentic RAG for intelligent vulnerability detection
+    -  **📊 CVE Knowledge Base**: Leverages real CVE data to provide CWE classifications and CVSS severity scores
+    -  **🔍 Deep Code Analysis**: Scans for SQL injection, XSS, command injection, and framework-specific vulnerabilities
+    -  **📁 Repository & File Support**: Analyze entire repositories or focus on specific files
+    -  **📋 Enhanced Reports**: Comprehensive security reports with CVE references, CWE mappings, and remediation strategies
+    -  **🔒 Secure Processing**: Your API keys are used securely and never stored
     **Project Links:**
     - 📂 **Source Code**: [GitHub Repository](https://github.com/banno-0720/vulnerability-scanner)

requirements.txt CHANGED Viewed

@@ -6,4 +6,9 @@ smolagents>=0.1.0
 requests>=2.28.0
 python-dotenv>=1.0.0
 pydantic>=2.11,<2.12
 smolagents[mcp]>=0.1.0

 requests>=2.28.0
 python-dotenv>=1.0.0
 pydantic>=2.11,<2.12
+pandas>=1.5.0
+langchain>=0.1.0
+langchain-community>=0.0.20
+sentence-transformers>=2.2.0
+rank-bm25>=0.2.2
 smolagents[mcp]>=0.1.0