--- title: Vulnerability Scanner emoji: 🏢 colorFrom: gray colorTo: blue sdk: gradio sdk_version: 5.47.0 app_file: app.py pinned: false license: mit short_description: AI-powered tool that analyzes GitHub repositories --- # 🛡️ AI-Powered GitHub Vulnerability Scanner [![Hackathon](https://img.shields.io/badge/MCP%20Hackathon-Track%202-blue)](https://huggingface.co/mcp-hackathon) [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE.md) [![Python](https://img.shields.io/badge/python-3.11+-blue)](https://www.python.org/) **Track Tag**: `mcp-in-action-track-enterprise` An autonomous AI agent system that performs comprehensive security analysis of GitHub repositories using Model Context Protocol (MCP) tools and agentic RAG. This intelligent agent autonomously plans, retrieves, and executes vulnerability assessments by combining GitHub data access, CVE knowledge bases, and advanced language models. > **⚠️ Important Notice**: This tool is designed for legitimate security research and vulnerability assessment purposes only. Do not use this scanner for malicious activities, unauthorized access, or any illegal purposes. Always ensure you have proper authorization before scanning repositories. --- ## 🎥 Demo Video [**Watch Demo Video**](https://www.linkedin.com/posts/ayush-mittal629_cybersecurity-opensource-devsecops-ugcPost-7378789358293925889-J9JV?utm_source=share&utm_medium=member_desktop&rcm=ACoAADoXXPEB7ECUKr-1dLXzf-VGkfHkzkpkivY) *(1-5 minutes showing the autonomous agent in action)* ## 📱 Social Media [**Project Announcement on X/LinkedIn**](https://www.linkedin.com/posts/ayush-mittal629_cybersecurity-opensource-devsecops-ugcPost-7378789358293925889-J9JV?utm_source=share&utm_medium=member_desktop&rcm=ACoAADoXXPEB7ECUKr-1dLXzf-VGkfHkzkpkivY) --- ## 🤖 Autonomous Agent Capabilities This project showcases advanced **autonomous agent behavior** with: ### Planning & Reasoning - **Intelligent Query Understanding**: Agent analyzes user requests and automatically plans multi-step security assessments - **Context-Aware Decisions**: Dynamically selects appropriate MCP tools based on repository structure and file types - **Adaptive Analysis**: Adjusts scanning depth and focus based on discovered vulnerabilities ### Tool Orchestration - **MCP Tool Integration**: Seamlessly uses 7 GitHub MCP tools for repository exploration and file retrieval - **CVE Knowledge Base Search**: Autonomously queries 10,000+ real-world vulnerability records from Hugging Face dataset - **Web Scraping**: Automatically fetches CVE details from NVD webpages for enhanced context ### Agentic RAG System - **Retrieval-Augmented Generation**: BM25-based retrieval finds relevant vulnerability patterns from CVE database - **Evidence-Based Analysis**: Correlates code patterns with real CVE examples for accurate detection - **Context Engineering**: Combines code analysis with historical vulnerability data for informed assessments - **Multi-Source Synthesis**: Integrates GitHub content, CVE records, and NVD data into comprehensive reports ### Execution & Reporting - **Autonomous Scanning**: Agent independently navigates repositories, analyzes files, and identifies vulnerabilities - **Structured Output**: Generates professional security reports with severity ratings, CWE classifications, and remediation advice - **Interactive Follow-up**: Maintains conversation context for clarifying questions and deeper analysis --- ## 🔗 Project Links - 📂 **Source Code**: [GitHub Repository](https://github.com/banno-0720/vulnerability-scanner) - 🔧 **MCP Server**: [Hugging Face Space](https://huggingface.co/spaces/HimanshuGoyal2004/github-mcp-server) - 🛡️ **Live Demo**: [Vulnerability Scanner Client](https://huggingface.co/spaces/HimanshuGoyal2004/Vulnerability-Scanner) --- ## ✨ Key Features ### 🤖 Autonomous Agent System - **Intelligent Planning**: Agent autonomously plans vulnerability assessment strategies - **Multi-Tool Orchestration**: Coordinates 7 MCP tools for comprehensive repository analysis - **Agentic RAG**: Retrieves and applies knowledge from 10,000+ CVE records - **Context Engineering**: Maintains conversation state and builds analysis context progressively ### 🔍 Advanced Detection - **AI-Powered Analysis**: Uses Hugging Face Inference API with advanced language models - **CVE Knowledge Base**: Leverages `CIRCL/vulnerability` dataset with CWE classifications and CVSS scores - **Multi-Language Support**: Analyzes Python, JavaScript, TypeScript, PHP, Java, C/C++, Go, Ruby, and more - **Pattern Recognition**: Identifies vulnerability patterns from historical security data ### 📊 Professional Reporting - **Comprehensive Reports**: Detailed findings with CVE references, severity ratings, and code snippets - **Remediation Guidance**: Specific fix recommendations for each identified vulnerability - **CWE Mapping**: Links vulnerabilities to Common Weakness Enumeration codes - **CVSS Scoring**: Provides severity assessment based on industry standards ### 🎨 Modern Interface - **Gradio Web UI**: User-friendly chat interface for interacting with the agent - **Real-Time Analysis**: Immediate feedback as the agent explores and analyzes code - **Interactive Chat**: Ask follow-up questions and request deeper analysis - **Secure API Key Handling**: Keys entered in UI, never stored permanently --- ## 🏗️ System Architecture ### MCP-Based Agent Workflow ``` User Request → AI Agent (Planning) → MCP Tools (GitHub Data) → CVE RAG (Knowledge Retrieval) → AI Analysis (Reasoning) → Security Report (Execution) → User Response ``` ### Core Components 1. **GitHub MCP Server** (`server.py`): - 7 MCP tools for GitHub API access - Repository information retrieval - File content extraction and directory scanning - CVE Knowledge Base with BM25 retrieval - NVD API integration for official CVE details 2. **AI Agent Client** (`client.py`): - Autonomous planning and reasoning engine - MCP tool orchestration - Agentic RAG with CVE database integration - Web scraping for enhanced research - Gradio interface for user interaction 3. **Knowledge Base**: - Hugging Face dataset: `CIRCL/vulnerability` - 10,000+ CVE records with descriptions - CWE codes and CVSS severity scores - Vulnerability summaries and technical details --- ## 🚀 Quick Start ### Prerequisites - Python 3.11+ - Hugging Face account and API token ([Get one](https://huggingface.co/settings/tokens)) - GitHub personal access token (optional, for private repos - [Get one](https://github.com/settings/tokens)) ### Installation ```bash # Clone the repository git clone https://github.com/banno-0720/vulnerability-scanner.git cd vulnerability-scanner # Create virtual environment python -m venv venv venv\Scripts\activate # Windows # source venv/bin/activate # Linux/Mac # Install dependencies pip install -r requirements.txt # Authenticate with Hugging Face (for CVE dataset) huggingface-cli login # Optional: Set GitHub token in .env echo "GITHUB_TOKEN=your_token_here" > .env ``` ### Usage ```bash # Start the vulnerability scanner python client.py # Open browser to http://localhost:7861 # Enter your Hugging Face API key # Paste a GitHub file URL and watch the agent work! ``` ### Example Analysis Try these test files: - `https://github.com/ayushmittal62/vunreability_scanner_testing/blob/master/python/database.py` - `https://github.com/ayushmittal62/vunreability_scanner_testing/blob/master/database/schema.sql` --- ## 🔍 Vulnerability Detection Categories The autonomous agent identifies: 1. **💉 Injection Vulnerabilities**: SQL injection, command injection, code injection 2. **🌐 Cross-Site Scripting (XSS)**: Reflected, stored, and DOM-based XSS 3. **⚙️ Security Misconfigurations**: Hardcoded secrets, weak crypto, insecure configs 4. **🔐 Access Control Issues**: Broken authentication, session flaws, authorization bypasses 5. **📊 Data Exposure**: Sensitive data in logs, information disclosure 6. **✅ Input Validation**: Path traversal, file upload issues, unvalidated inputs --- ## 📦 Dependencies ### Core Agent Framework - `gradio[oauth,mcp]==5.45.0` - Web interface with MCP support - `smolagents[mcp]>=0.1.0` - AI agent framework for autonomous behavior - `mcp==1.10.1` - Model Context Protocol implementation ### Agentic RAG Stack - `datasets>=2.0.0` - Hugging Face datasets for CVE data - `langchain>=0.1.0` - LLM application framework - `sentence-transformers>=2.2.0` - Semantic embeddings - `rank-bm25>=0.2.2` - BM25 retrieval algorithm ### Tool Integration - `requests>=2.28.0` - HTTP client for APIs - `beautifulsoup4>=4.12.0` - Web scraping - `markdownify>=0.11.6` - HTML to Markdown conversion --- ## 🏆 Hackathon Submission - Track 2: MCP in Action ### Category: Enterprise Applications **Track Tag**: `mcp-in-action-track-enterprise` ### Why This Qualifies for Track 2 ✅ **Autonomous Agent Behavior**: - Planning: Agent analyzes requests and plans multi-step security assessments - Reasoning: Makes context-aware decisions about tool selection and analysis depth - Execution: Autonomously navigates repositories, analyzes code, and generates reports ✅ **MCP Tools Integration**: - Uses 7 GitHub MCP tools for repository data access - Seamlessly orchestrates multiple tools in a single analysis workflow ✅ **Advanced Agent Features**: - **Context Engineering**: Maintains conversation state and builds progressive analysis context - **Agentic RAG**: Retrieves relevant knowledge from 10,000+ CVE records using BM25 algorithm - **Multi-Source Synthesis**: Combines GitHub data, CVE database, and NVD information ✅ **Clear Enterprise Value**: - Automated security assessment for development teams - Identifies vulnerabilities before production deployment - Provides actionable remediation guidance - Reduces manual security review time ✅ **Gradio Application**: ✓ --- ## 👥 Team Members - **Himanshu Goyal** - [@HimanshuGoyal2004](https://huggingface.co/HimanshuGoyal2004) - **Ayush Mittal** - [@baction](https://huggingface.co/Baction) --- ## 🔒 Security & Ethics - **Authorized Use Only**: Only scan repositories you have permission to analyze - **API Key Security**: Keys entered in UI, never stored permanently - **Rate Limiting**: Respectful of API quotas and rate limits - **Responsible Disclosure**: Use findings responsibly for legitimate security improvements --- ## 🤝 Contributing Contributions welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Make your changes with tests 4. Submit a pull request --- ## 📄 License MIT License - See [LICENSE.md](LICENSE.md) for details --- ## 🙏 Acknowledgments - **Anthropic** - For MCP protocol and hackathon - **Hugging Face** - For Spaces, Inference API, and CVE dataset - **Gradio** - For the excellent web framework with MCP support - **CIRCL** - For maintaining the comprehensive vulnerability dataset --- **Built with ❤️ for the MCP 1st Anniversary Hackathon - Showcasing autonomous AI agents with MCP tools and Agentic RAG**