Baction's picture
Update README.md
b1a91db verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: Vulnerability Scanner
emoji: 🏒
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.47.0
app_file: app.py
pinned: false
license: mit
short_description: AI-powered tool that analyzes GitHub repositories

πŸ›‘οΈ AI-Powered GitHub Vulnerability Scanner

Hackathon License Python

Track Tag: mcp-in-action-track-enterprise

An autonomous AI agent system that performs comprehensive security analysis of GitHub repositories using Model Context Protocol (MCP) tools and agentic RAG. This intelligent agent autonomously plans, retrieves, and executes vulnerability assessments by combining GitHub data access, CVE knowledge bases, and advanced language models.

⚠️ Important Notice: This tool is designed for legitimate security research and vulnerability assessment purposes only. Do not use this scanner for malicious activities, unauthorized access, or any illegal purposes. Always ensure you have proper authorization before scanning repositories.


πŸŽ₯ Demo Video

Watch Demo Video (1-5 minutes showing the autonomous agent in action)

πŸ“± Social Media

Project Announcement on X/LinkedIn


πŸ€– Autonomous Agent Capabilities

This project showcases advanced autonomous agent behavior with:

Planning & Reasoning

  • Intelligent Query Understanding: Agent analyzes user requests and automatically plans multi-step security assessments
  • Context-Aware Decisions: Dynamically selects appropriate MCP tools based on repository structure and file types
  • Adaptive Analysis: Adjusts scanning depth and focus based on discovered vulnerabilities

Tool Orchestration

  • MCP Tool Integration: Seamlessly uses 7 GitHub MCP tools for repository exploration and file retrieval
  • CVE Knowledge Base Search: Autonomously queries 10,000+ real-world vulnerability records from Hugging Face dataset
  • Web Scraping: Automatically fetches CVE details from NVD webpages for enhanced context

Agentic RAG System

  • Retrieval-Augmented Generation: BM25-based retrieval finds relevant vulnerability patterns from CVE database
  • Evidence-Based Analysis: Correlates code patterns with real CVE examples for accurate detection
  • Context Engineering: Combines code analysis with historical vulnerability data for informed assessments
  • Multi-Source Synthesis: Integrates GitHub content, CVE records, and NVD data into comprehensive reports

Execution & Reporting

  • Autonomous Scanning: Agent independently navigates repositories, analyzes files, and identifies vulnerabilities
  • Structured Output: Generates professional security reports with severity ratings, CWE classifications, and remediation advice
  • Interactive Follow-up: Maintains conversation context for clarifying questions and deeper analysis

πŸ”— Project Links


✨ Key Features

πŸ€– Autonomous Agent System

  • Intelligent Planning: Agent autonomously plans vulnerability assessment strategies
  • Multi-Tool Orchestration: Coordinates 7 MCP tools for comprehensive repository analysis
  • Agentic RAG: Retrieves and applies knowledge from 10,000+ CVE records
  • Context Engineering: Maintains conversation state and builds analysis context progressively

πŸ” Advanced Detection

  • AI-Powered Analysis: Uses Hugging Face Inference API with advanced language models
  • CVE Knowledge Base: Leverages CIRCL/vulnerability dataset with CWE classifications and CVSS scores
  • Multi-Language Support: Analyzes Python, JavaScript, TypeScript, PHP, Java, C/C++, Go, Ruby, and more
  • Pattern Recognition: Identifies vulnerability patterns from historical security data

πŸ“Š Professional Reporting

  • Comprehensive Reports: Detailed findings with CVE references, severity ratings, and code snippets
  • Remediation Guidance: Specific fix recommendations for each identified vulnerability
  • CWE Mapping: Links vulnerabilities to Common Weakness Enumeration codes
  • CVSS Scoring: Provides severity assessment based on industry standards

🎨 Modern Interface

  • Gradio Web UI: User-friendly chat interface for interacting with the agent
  • Real-Time Analysis: Immediate feedback as the agent explores and analyzes code
  • Interactive Chat: Ask follow-up questions and request deeper analysis
  • Secure API Key Handling: Keys entered in UI, never stored permanently

πŸ—οΈ System Architecture

MCP-Based Agent Workflow

User Request β†’ AI Agent (Planning) β†’ MCP Tools (GitHub Data) β†’ CVE RAG (Knowledge Retrieval) β†’ 
AI Analysis (Reasoning) β†’ Security Report (Execution) β†’ User Response

Core Components

  1. GitHub MCP Server (server.py):

    • 7 MCP tools for GitHub API access
    • Repository information retrieval
    • File content extraction and directory scanning
    • CVE Knowledge Base with BM25 retrieval
    • NVD API integration for official CVE details
  2. AI Agent Client (client.py):

    • Autonomous planning and reasoning engine
    • MCP tool orchestration
    • Agentic RAG with CVE database integration
    • Web scraping for enhanced research
    • Gradio interface for user interaction
  3. Knowledge Base:

    • Hugging Face dataset: CIRCL/vulnerability
    • 10,000+ CVE records with descriptions
    • CWE codes and CVSS severity scores
    • Vulnerability summaries and technical details

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Hugging Face account and API token (Get one)
  • GitHub personal access token (optional, for private repos - Get one)

Installation

# Clone the repository
git clone https://github.com/banno-0720/vulnerability-scanner.git
cd vulnerability-scanner

# Create virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
# source venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -r requirements.txt

# Authenticate with Hugging Face (for CVE dataset)
huggingface-cli login

# Optional: Set GitHub token in .env
echo "GITHUB_TOKEN=your_token_here" > .env

Usage

# Start the vulnerability scanner
python client.py

# Open browser to http://localhost:7861
# Enter your Hugging Face API key
# Paste a GitHub file URL and watch the agent work!

Example Analysis

Try these test files:

  • https://github.com/ayushmittal62/vunreability_scanner_testing/blob/master/python/database.py
  • https://github.com/ayushmittal62/vunreability_scanner_testing/blob/master/database/schema.sql

πŸ” Vulnerability Detection Categories

The autonomous agent identifies:

  1. πŸ’‰ Injection Vulnerabilities: SQL injection, command injection, code injection
  2. 🌐 Cross-Site Scripting (XSS): Reflected, stored, and DOM-based XSS
  3. βš™οΈ Security Misconfigurations: Hardcoded secrets, weak crypto, insecure configs
  4. πŸ” Access Control Issues: Broken authentication, session flaws, authorization bypasses
  5. πŸ“Š Data Exposure: Sensitive data in logs, information disclosure
  6. βœ… Input Validation: Path traversal, file upload issues, unvalidated inputs

πŸ“¦ Dependencies

Core Agent Framework

  • gradio[oauth,mcp]==5.45.0 - Web interface with MCP support
  • smolagents[mcp]>=0.1.0 - AI agent framework for autonomous behavior
  • mcp==1.10.1 - Model Context Protocol implementation

Agentic RAG Stack

  • datasets>=2.0.0 - Hugging Face datasets for CVE data
  • langchain>=0.1.0 - LLM application framework
  • sentence-transformers>=2.2.0 - Semantic embeddings
  • rank-bm25>=0.2.2 - BM25 retrieval algorithm

Tool Integration

  • requests>=2.28.0 - HTTP client for APIs
  • beautifulsoup4>=4.12.0 - Web scraping
  • markdownify>=0.11.6 - HTML to Markdown conversion

πŸ† Hackathon Submission - Track 2: MCP in Action

Category: Enterprise Applications

Track Tag: mcp-in-action-track-enterprise

Why This Qualifies for Track 2

βœ… Autonomous Agent Behavior:

  • Planning: Agent analyzes requests and plans multi-step security assessments
  • Reasoning: Makes context-aware decisions about tool selection and analysis depth
  • Execution: Autonomously navigates repositories, analyzes code, and generates reports

βœ… MCP Tools Integration:

  • Uses 7 GitHub MCP tools for repository data access
  • Seamlessly orchestrates multiple tools in a single analysis workflow

βœ… Advanced Agent Features:

  • Context Engineering: Maintains conversation state and builds progressive analysis context
  • Agentic RAG: Retrieves relevant knowledge from 10,000+ CVE records using BM25 algorithm
  • Multi-Source Synthesis: Combines GitHub data, CVE database, and NVD information

βœ… Clear Enterprise Value:

  • Automated security assessment for development teams
  • Identifies vulnerabilities before production deployment
  • Provides actionable remediation guidance
  • Reduces manual security review time

βœ… Gradio Application: βœ“


πŸ‘₯ Team Members


πŸ”’ Security & Ethics

  • Authorized Use Only: Only scan repositories you have permission to analyze
  • API Key Security: Keys entered in UI, never stored permanently
  • Rate Limiting: Respectful of API quotas and rate limits
  • Responsible Disclosure: Use findings responsibly for legitimate security improvements

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

πŸ“„ License

MIT License - See LICENSE.md for details


πŸ™ Acknowledgments

  • Anthropic - For MCP protocol and hackathon
  • Hugging Face - For Spaces, Inference API, and CVE dataset
  • Gradio - For the excellent web framework with MCP support
  • CIRCL - For maintaining the comprehensive vulnerability dataset

Built with ❀️ for the MCP 1st Anniversary Hackathon - Showcasing autonomous AI agents with MCP tools and Agentic RAG