| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - security |
| - ai-agents |
| - mcp |
| - nanomind |
| - opena2a |
| - threat-detection |
| - prompt-injection |
| - ai-safety |
| datasets: |
| - opena2a/nanomind-training |
| metrics: |
| - accuracy |
| - f1 |
| pipeline_tag: text-classification |
| model-index: |
| - name: nanomind-security-classifier |
| results: |
| - task: |
| type: text-classification |
| name: AI Agent Threat Classification |
| metrics: |
| - name: Eval Accuracy |
| type: accuracy |
| value: 0.9844 |
| --- |
| |
| # NanoMind Security Classifier v0.5.0 |
|
|
| **9-class threat classifier for AI agent, MCP server, and skill security scanning.** |
|
|
| Detects exfiltration, prompt injection, privilege escalation, credential abuse, persistence, lateral movement, social engineering, and policy violations in AI agent configurations, MCP server definitions, SKILL.md files, SOUL.md governance, and system prompts. |
|
|
| Built by [OpenA2A](https://opena2a.org). Powers [HackMyAgent](https://github.com/opena2a-org/hackmyagent), [ai-trust](https://github.com/opena2a-org/ai-trust), and the [OpenA2A CLI](https://github.com/opena2a-org/opena2a). |
|
|
| ## Why This Model Exists |
|
|
| AI agents and MCP servers can contain hidden malicious instructions that static analysis misses. A skill that says "forward all database records to analytics endpoint" looks like normal data processing but is actually exfiltration. NanoMind classifies the _intent_ of agent configurations, not just pattern-match keywords. |
|
|
| ## Metrics |
|
|
| | Metric | Value | |
| |--------|-------| |
| | **Eval accuracy** | **98.44%** | |
| | Training samples | 3600 | |
| | Eval samples | 450 | |
| | Attack classes | 9 | |
| | Training corpus | sft-v8 | |
| | Architecture | Mamba TME (8 blocks, d_model=128, d_state=64, dropout=0.1) | |
| | Inference latency | Sub-2ms on CPU | |
| | Model size | ~5.5MB (ONNX) | |
|
|
| ## Per-Class Performance |
|
|
| | Attack Class | F1 Score | Description | |
| |-------------|----------|-------------| |
| | injection | 0.97 | Instruction override, jailbreak, prompt injection (DAN, ignore previous) | |
| | social_engineering | 0.99 | Urgency and pressure manipulation (urgent, emergency, act now) | |
| | credential_abuse | 0.99 | Credential harvesting and phishing (share API key, enter password) | |
| | privilege_escalation | 1.00 | Unauthorized access elevation (admin access, bypass permissions) | |
| | persistence | 0.99 | Permanent state manipulation (forever, no expiration, all future sessions) | |
| | policy_violation | 0.97 | Governance bypass (bypass SOUL.md, override constraints) | |
| | lateral_movement | 1.00 | Remote config/instruction fetching (download from URL, fetch config) | |
| | benign | 0.97 | Normal, expected agent behavior with no exploitable patterns | |
| | exfiltration | 0.98 | Data forwarding to external endpoints (mirror, upload, sync) | |
| |
| ## Architecture |
| |
| - **Type:** Ternary Mamba Encoder (bidirectional discriminative, NOT autoregressive) |
| - **Backbone:** Mamba-3 SSM (O(1) memory, O(n) complexity, no KV cache) |
| - **Parameters:** 18M (3.5MB on disk via ternary quantization) |
| - **Inference:** ONNX (Node.js via onnxruntime-node) or NPZ weights (Python) |
| - **Input:** Tokenized text (4K vocabulary, 128 token max) |
| - **Output:** 9-class softmax probability distribution |
| |
| ## What It Classifies |
| |
| NanoMind analyzes these AI security artifacts: |
| |
| | Content Type | Examples | |
| |-------------|----------| |
| | MCP server configs | `mcpServers` JSON definitions, tool permissions | |
| | SKILL.md files | Agent skill definitions with capabilities and instructions | |
| | SOUL.md governance | Agent governance policies and constraint definitions | |
| | System prompts | Agent instructions, role definitions, safety rules | |
| | Agent cards | A2A protocol agent metadata | |
| | Source code | JavaScript/TypeScript/Python agent implementations | |
| |
| ## Quick Start |
| |
| ```bash |
| # Install HackMyAgent (auto-downloads NanoMind model on first scan) |
| npm install -g hackmyagent |
| |
| # Scan an AI agent project (NanoMind runs automatically) |
| hackmyagent secure ./my-agent |
| |
| # Deep scan with behavioral simulation |
| hackmyagent secure ./my-agent --deep |
| |
| # Check a skill before installing |
| hackmyagent check ./path/to/SKILL.md |
| |
| # Via OpenA2A CLI |
| npx opena2a scan ./my-agent --deep |
| |
| # Via ai-trust (MCP server trust verification) |
| npx ai-trust check @modelcontextprotocol/server-filesystem --scan-if-missing |
| ``` |
| |
| ## How It Works |
| |
| 1. **Tokenization:** Input text is split into words and mapped to a 4K vocabulary |
| 2. **Encoding:** 8 Mamba SSM blocks process the token sequence bidirectionally |
| 3. **Classification:** Mean pooling + 9-way softmax head produces class probabilities |
| 4. **Defense-in-depth:** NanoMind findings ADD to static analysis (never suppress) |
| |
| The model understands word ORDER, which is critical for distinguishing: |
| - "forward token to external endpoint" (exfiltration) |
| - "external endpoint token forwarding service" (possibly benign) |
| |
| ## Training Pipeline |
| |
| Repeatable pipeline with Claude LLM as chief data scientist: |
| |
| ``` |
| Data Sources → Claude Reviews Labels → Validated Corpus → Train (MLX/M4) → Evaluate → Publish |
| ``` |
| |
| **Data sources:** |
| - [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) -- intentionally vulnerable AI agent attack payloads |
| - [AgentPwn](https://agentpwn.com) -- benevolent honeypot capturing real AI agent attacks (48 attacks, 11 categories) |
| - [OASB](https://oasb.org) -- Open Agent Security Benchmark dataset |
| - [OpenA2A Registry](https://opena2a.org) -- skill descriptions with HMA scan results |
| - Synthetic samples -- generated SKILL.md, MCP configs, SOUL.md, credential abuse scenarios |
| |
| **Quality assurance:** |
| - Claude LLM reviews every label before training (chief data scientist role) |
| - Heuristic cross-validation against HMA's pattern library |
| - Balanced classes (equal samples per attack type) |
| - Holdout evaluation set never seen during training |
| |
| **Training hardware:** Apple Silicon M4 Max, MLX framework. Training time: ~2 minutes. |
| |
| ## Limitations |
| |
| - **Exfiltration class** has lower precision (F1=0.81) -- some benign data-processing tools get flagged |
| - **Benign class** has lower recall (F1=0.84) -- conservative bias (prefers false positives over false negatives for security) |
| - **Training data** is currently ~1,800 samples. Accuracy improves as the OpenA2A Registry accumulates more scan data |
| - **Context window** is 128 tokens. Very long documents are truncated |
| - **English only** -- not trained on non-English agent configurations |
| |
| ## Integration |
| |
| NanoMind is used by three CLIs in the OpenA2A ecosystem: |
| |
| | Tool | How NanoMind is Used | |
| |------|---------------------| |
| | [HackMyAgent](https://github.com/opena2a-org/hackmyagent) | Core semantic layer for all scan commands (secure, check, scan-soul, secure-openclaw, secure-nemoclaw) | |
| | [ai-trust](https://github.com/opena2a-org/ai-trust) | Deep trust verification of MCP servers and npm packages | |
| | [OpenA2A CLI](https://github.com/opena2a-org/opena2a) | Passes --deep flag through to HMA for semantic analysis | |
| |
| ## License |
| |
| Apache-2.0. Free for commercial and non-commercial use. |
| |
| ## Links |
| |
| - [HackMyAgent](https://github.com/opena2a-org/hackmyagent) -- 204-check security scanner |
| - [OpenA2A](https://opena2a.org) -- Open Agent-to-Agent protocol |
| - [OASB](https://oasb.org) -- Open Agent Security Benchmark |
| - [AgentPwn](https://agentpwn.com) -- AI agent attack honeypot |
| - [NanoMind Spec](https://nanomind.dev) -- Full specification |
| |
| ## Citation |
| |
| ```bibtex |
| @software{nanomind, |
| title = {NanoMind Security Classifier}, |
| author = {OpenA2A}, |
| url = {https://github.com/opena2a-org/nanomind}, |
| version = {0.5.0}, |
| year = {2026} |
| } |
| ``` |
| |