Spaces:

MCP-1st-Birthday
/

Vulnerability-Scanner

Sleeping

App Files Files Community

Vulnerability-Scanner / README.md

Baction

Update README.md

b1a91db verified about 1 month ago

preview code

raw

history blame contribute delete

11.2 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: Vulnerability Scanner
emoji: 🏢
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.47.0
app_file: app.py
pinned: false
license: mit
short_description: AI-powered tool that analyzes GitHub repositories

🛡️ AI-Powered GitHub Vulnerability Scanner

Track Tag: mcp-in-action-track-enterprise

An autonomous AI agent system that performs comprehensive security analysis of GitHub repositories using Model Context Protocol (MCP) tools and agentic RAG. This intelligent agent autonomously plans, retrieves, and executes vulnerability assessments by combining GitHub data access, CVE knowledge bases, and advanced language models.

⚠️ Important Notice: This tool is designed for legitimate security research and vulnerability assessment purposes only. Do not use this scanner for malicious activities, unauthorized access, or any illegal purposes. Always ensure you have proper authorization before scanning repositories.

🎥 Demo Video

Watch Demo Video (1-5 minutes showing the autonomous agent in action)

📱 Social Media

Project Announcement on X/LinkedIn

🤖 Autonomous Agent Capabilities

This project showcases advanced autonomous agent behavior with:

Planning & Reasoning

Intelligent Query Understanding: Agent analyzes user requests and automatically plans multi-step security assessments
Context-Aware Decisions: Dynamically selects appropriate MCP tools based on repository structure and file types
Adaptive Analysis: Adjusts scanning depth and focus based on discovered vulnerabilities

Tool Orchestration

MCP Tool Integration: Seamlessly uses 7 GitHub MCP tools for repository exploration and file retrieval
CVE Knowledge Base Search: Autonomously queries 10,000+ real-world vulnerability records from Hugging Face dataset
Web Scraping: Automatically fetches CVE details from NVD webpages for enhanced context

Agentic RAG System

Retrieval-Augmented Generation: BM25-based retrieval finds relevant vulnerability patterns from CVE database
Evidence-Based Analysis: Correlates code patterns with real CVE examples for accurate detection
Context Engineering: Combines code analysis with historical vulnerability data for informed assessments
Multi-Source Synthesis: Integrates GitHub content, CVE records, and NVD data into comprehensive reports

Execution & Reporting

Autonomous Scanning: Agent independently navigates repositories, analyzes files, and identifies vulnerabilities
Structured Output: Generates professional security reports with severity ratings, CWE classifications, and remediation advice
Interactive Follow-up: Maintains conversation context for clarifying questions and deeper analysis

🔗 Project Links

📂 Source Code: GitHub Repository
🔧 MCP Server: Hugging Face Space
🛡️ Live Demo: Vulnerability Scanner Client

✨ Key Features

🤖 Autonomous Agent System

Intelligent Planning: Agent autonomously plans vulnerability assessment strategies
Multi-Tool Orchestration: Coordinates 7 MCP tools for comprehensive repository analysis
Agentic RAG: Retrieves and applies knowledge from 10,000+ CVE records
Context Engineering: Maintains conversation state and builds analysis context progressively

🔍 Advanced Detection

AI-Powered Analysis: Uses Hugging Face Inference API with advanced language models
CVE Knowledge Base: Leverages CIRCL/vulnerability dataset with CWE classifications and CVSS scores
Multi-Language Support: Analyzes Python, JavaScript, TypeScript, PHP, Java, C/C++, Go, Ruby, and more
Pattern Recognition: Identifies vulnerability patterns from historical security data

📊 Professional Reporting

Comprehensive Reports: Detailed findings with CVE references, severity ratings, and code snippets
Remediation Guidance: Specific fix recommendations for each identified vulnerability
CWE Mapping: Links vulnerabilities to Common Weakness Enumeration codes
CVSS Scoring: Provides severity assessment based on industry standards

🎨 Modern Interface

Gradio Web UI: User-friendly chat interface for interacting with the agent
Real-Time Analysis: Immediate feedback as the agent explores and analyzes code
Interactive Chat: Ask follow-up questions and request deeper analysis
Secure API Key Handling: Keys entered in UI, never stored permanently

🏗️ System Architecture

MCP-Based Agent Workflow

User Request → AI Agent (Planning) → MCP Tools (GitHub Data) → CVE RAG (Knowledge Retrieval) → 
AI Analysis (Reasoning) → Security Report (Execution) → User Response

Core Components

GitHub MCP Server (server.py):
- 7 MCP tools for GitHub API access
- Repository information retrieval
- File content extraction and directory scanning
- CVE Knowledge Base with BM25 retrieval
- NVD API integration for official CVE details
AI Agent Client (client.py):
- Autonomous planning and reasoning engine
- MCP tool orchestration
- Agentic RAG with CVE database integration
- Web scraping for enhanced research
- Gradio interface for user interaction
Knowledge Base:
- Hugging Face dataset: CIRCL/vulnerability
- 10,000+ CVE records with descriptions
- CWE codes and CVSS severity scores
- Vulnerability summaries and technical details

🚀 Quick Start

Prerequisites

Python 3.11+
Hugging Face account and API token (Get one)
GitHub personal access token (optional, for private repos - Get one)

Installation

# Clone the repository
git clone https://github.com/banno-0720/vulnerability-scanner.git
cd vulnerability-scanner

# Create virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
# source venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -r requirements.txt

# Authenticate with Hugging Face (for CVE dataset)
huggingface-cli login

# Optional: Set GitHub token in .env
echo "GITHUB_TOKEN=your_token_here" > .env

Usage

# Start the vulnerability scanner
python client.py

# Open browser to http://localhost:7861
# Enter your Hugging Face API key
# Paste a GitHub file URL and watch the agent work!

Example Analysis

Try these test files:

https://github.com/ayushmittal62/vunreability_scanner_testing/blob/master/python/database.py
https://github.com/ayushmittal62/vunreability_scanner_testing/blob/master/database/schema.sql

🔍 Vulnerability Detection Categories

The autonomous agent identifies:

💉 Injection Vulnerabilities: SQL injection, command injection, code injection
🌐 Cross-Site Scripting (XSS): Reflected, stored, and DOM-based XSS
⚙️ Security Misconfigurations: Hardcoded secrets, weak crypto, insecure configs
🔐 Access Control Issues: Broken authentication, session flaws, authorization bypasses
📊 Data Exposure: Sensitive data in logs, information disclosure
✅ Input Validation: Path traversal, file upload issues, unvalidated inputs

📦 Dependencies

Core Agent Framework

gradio[oauth,mcp]==5.45.0 - Web interface with MCP support
smolagents[mcp]>=0.1.0 - AI agent framework for autonomous behavior
mcp==1.10.1 - Model Context Protocol implementation

Agentic RAG Stack

datasets>=2.0.0 - Hugging Face datasets for CVE data
langchain>=0.1.0 - LLM application framework
sentence-transformers>=2.2.0 - Semantic embeddings
rank-bm25>=0.2.2 - BM25 retrieval algorithm

Tool Integration

requests>=2.28.0 - HTTP client for APIs
beautifulsoup4>=4.12.0 - Web scraping
markdownify>=0.11.6 - HTML to Markdown conversion

🏆 Hackathon Submission - Track 2: MCP in Action

Category: Enterprise Applications

Track Tag: mcp-in-action-track-enterprise

Why This Qualifies for Track 2

✅ Autonomous Agent Behavior:

Planning: Agent analyzes requests and plans multi-step security assessments
Reasoning: Makes context-aware decisions about tool selection and analysis depth
Execution: Autonomously navigates repositories, analyzes code, and generates reports

✅ MCP Tools Integration:

Uses 7 GitHub MCP tools for repository data access
Seamlessly orchestrates multiple tools in a single analysis workflow

✅ Advanced Agent Features:

Context Engineering: Maintains conversation state and builds progressive analysis context
Agentic RAG: Retrieves relevant knowledge from 10,000+ CVE records using BM25 algorithm
Multi-Source Synthesis: Combines GitHub data, CVE database, and NVD information

✅ Clear Enterprise Value:

Automated security assessment for development teams
Identifies vulnerabilities before production deployment
Provides actionable remediation guidance
Reduces manual security review time

✅ Gradio Application: ✓

👥 Team Members

Himanshu Goyal - @HimanshuGoyal2004
Ayush Mittal - @baction

🔒 Security & Ethics

Authorized Use Only: Only scan repositories you have permission to analyze
API Key Security: Keys entered in UI, never stored permanently
Rate Limiting: Respectful of API quotas and rate limits
Responsible Disclosure: Use findings responsibly for legitimate security improvements

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Make your changes with tests
Submit a pull request

📄 License

MIT License - See LICENSE.md for details

🙏 Acknowledgments

Anthropic - For MCP protocol and hackathon
Hugging Face - For Spaces, Inference API, and CVE dataset
Gradio - For the excellent web framework with MCP support
CIRCL - For maintaining the comprehensive vulnerability dataset

Built with ❤️ for the MCP 1st Anniversary Hackathon - Showcasing autonomous AI agents with MCP tools and Agentic RAG