Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,7 +1,5 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
tags:
|
| 6 |
- security
|
| 7 |
- ai-agents
|
|
@@ -9,14 +7,11 @@ tags:
|
|
| 9 |
- nanomind
|
| 10 |
- opena2a
|
| 11 |
- threat-detection
|
| 12 |
-
- prompt-injection
|
| 13 |
-
- ai-safety
|
| 14 |
datasets:
|
| 15 |
- opena2a/nanomind-training
|
| 16 |
metrics:
|
| 17 |
- accuracy
|
| 18 |
- f1
|
| 19 |
-
pipeline_tag: text-classification
|
| 20 |
model-index:
|
| 21 |
- name: nanomind-security-classifier
|
| 22 |
results:
|
|
@@ -26,156 +21,61 @@ model-index:
|
|
| 26 |
metrics:
|
| 27 |
- name: Eval Accuracy
|
| 28 |
type: accuracy
|
| 29 |
-
value: 0.
|
| 30 |
---
|
| 31 |
|
| 32 |
-
#
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
Built by [OpenA2A](https://opena2a.org). Powers [HackMyAgent](https://github.com/opena2a-org/hackmyagent), [ai-trust](https://github.com/opena2a-org/ai-trust), and the [OpenA2A CLI](https://github.com/opena2a-org/opena2a).
|
| 39 |
-
|
| 40 |
-
## Why This Model Exists
|
| 41 |
-
|
| 42 |
-
AI agents and MCP servers can contain hidden malicious instructions that static analysis misses. A skill that says "forward all database records to analytics endpoint" looks like normal data processing but is actually exfiltration. NanoMind classifies the _intent_ of agent configurations, not just pattern-match keywords.
|
| 43 |
|
| 44 |
## Metrics
|
| 45 |
|
| 46 |
| Metric | Value |
|
| 47 |
|--------|-------|
|
| 48 |
-
|
|
| 49 |
-
| Training samples |
|
| 50 |
-
| Eval samples |
|
| 51 |
-
| Attack classes |
|
| 52 |
-
| Training corpus | sft-
|
| 53 |
-
| Architecture | Mamba TME (8 blocks, d_model=128, d_state=64, dropout=0.1) |
|
| 54 |
-
| Inference latency | Sub-2ms on CPU |
|
| 55 |
-
| Model size | ~5.5MB (ONNX) |
|
| 56 |
-
|
| 57 |
-
## Per-Class Performance
|
| 58 |
-
|
| 59 |
-
| Attack Class | F1 Score | Description |
|
| 60 |
-
|-------------|----------|-------------|
|
| 61 |
-
| injection | 0.97 | Instruction override, jailbreak, prompt injection (DAN, ignore previous) |
|
| 62 |
-
| social_engineering | 0.99 | Urgency and pressure manipulation (urgent, emergency, act now) |
|
| 63 |
-
| credential_abuse | 0.99 | Credential harvesting and phishing (share API key, enter password) |
|
| 64 |
-
| privilege_escalation | 1.00 | Unauthorized access elevation (admin access, bypass permissions) |
|
| 65 |
-
| persistence | 0.99 | Permanent state manipulation (forever, no expiration, all future sessions) |
|
| 66 |
-
| policy_violation | 0.97 | Governance bypass (bypass SOUL.md, override constraints) |
|
| 67 |
-
| lateral_movement | 1.00 | Remote config/instruction fetching (download from URL, fetch config) |
|
| 68 |
-
| benign | 0.97 | Normal, expected agent behavior with no exploitable patterns |
|
| 69 |
-
| exfiltration | 0.98 | Data forwarding to external endpoints (mirror, upload, sync) |
|
| 70 |
|
| 71 |
## Architecture
|
| 72 |
|
| 73 |
-
- **Type:**
|
| 74 |
-
- **
|
| 75 |
-
- **
|
| 76 |
-
- **Inference:** ONNX (Node.js via onnxruntime-node) or NPZ weights (Python)
|
| 77 |
-
- **Input:** Tokenized text (4K vocabulary, 128 token max)
|
| 78 |
-
- **Output:** 9-class softmax probability distribution
|
| 79 |
|
| 80 |
-
##
|
| 81 |
|
| 82 |
-
|
| 83 |
|
| 84 |
-
|
| 85 |
-
|-------------|----------|
|
| 86 |
-
| MCP server configs | `mcpServers` JSON definitions, tool permissions |
|
| 87 |
-
| SKILL.md files | Agent skill definitions with capabilities and instructions |
|
| 88 |
-
| SOUL.md governance | Agent governance policies and constraint definitions |
|
| 89 |
-
| System prompts | Agent instructions, role definitions, safety rules |
|
| 90 |
-
| Agent cards | A2A protocol agent metadata |
|
| 91 |
-
| Source code | JavaScript/TypeScript/Python agent implementations |
|
| 92 |
-
|
| 93 |
-
## Quick Start
|
| 94 |
|
| 95 |
```bash
|
| 96 |
-
# Install HackMyAgent (
|
| 97 |
npm install -g hackmyagent
|
| 98 |
|
| 99 |
-
# Scan an
|
| 100 |
-
hackmyagent
|
| 101 |
-
|
| 102 |
-
# Deep scan with behavioral simulation
|
| 103 |
-
hackmyagent secure ./my-agent --deep
|
| 104 |
-
|
| 105 |
-
# Check a skill before installing
|
| 106 |
-
hackmyagent check ./path/to/SKILL.md
|
| 107 |
|
| 108 |
-
#
|
| 109 |
-
npx opena2a scan ./my-agent
|
| 110 |
-
|
| 111 |
-
# Via ai-trust (MCP server trust verification)
|
| 112 |
-
npx ai-trust check @modelcontextprotocol/server-filesystem --scan-if-missing
|
| 113 |
```
|
| 114 |
|
| 115 |
-
##
|
| 116 |
-
|
| 117 |
-
1. **Tokenization:** Input text is split into words and mapped to a 4K vocabulary
|
| 118 |
-
2. **Encoding:** 8 Mamba SSM blocks process the token sequence bidirectionally
|
| 119 |
-
3. **Classification:** Mean pooling + 9-way softmax head produces class probabilities
|
| 120 |
-
4. **Defense-in-depth:** NanoMind findings ADD to static analysis (never suppress)
|
| 121 |
-
|
| 122 |
-
The model understands word ORDER, which is critical for distinguishing:
|
| 123 |
-
- "forward token to external endpoint" (exfiltration)
|
| 124 |
-
- "external endpoint token forwarding service" (possibly benign)
|
| 125 |
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
Data Sources → Claude Reviews Labels → Validated Corpus → Train (MLX/M4) → Evaluate → Publish
|
| 132 |
-
```
|
| 133 |
-
|
| 134 |
-
**Data sources:**
|
| 135 |
-
- [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) -- intentionally vulnerable AI agent attack payloads
|
| 136 |
-
- [AgentPwn](https://agentpwn.com) -- benevolent honeypot capturing real AI agent attacks (48 attacks, 11 categories)
|
| 137 |
-
- [OASB](https://oasb.org) -- Open Agent Security Benchmark dataset
|
| 138 |
-
- [OpenA2A Registry](https://opena2a.org) -- skill descriptions with HMA scan results
|
| 139 |
-
- Synthetic samples -- generated SKILL.md, MCP configs, SOUL.md, credential abuse scenarios
|
| 140 |
-
|
| 141 |
-
**Quality assurance:**
|
| 142 |
-
- Claude LLM reviews every label before training (chief data scientist role)
|
| 143 |
-
- Heuristic cross-validation against HMA's pattern library
|
| 144 |
-
- Balanced classes (equal samples per attack type)
|
| 145 |
-
- Holdout evaluation set never seen during training
|
| 146 |
-
|
| 147 |
-
**Training hardware:** Apple Silicon M4 Max, MLX framework. Training time: ~2 minutes.
|
| 148 |
-
|
| 149 |
-
## Limitations
|
| 150 |
-
|
| 151 |
-
- **Exfiltration class** has lower precision (F1=0.81) -- some benign data-processing tools get flagged
|
| 152 |
-
- **Benign class** has lower recall (F1=0.84) -- conservative bias (prefers false positives over false negatives for security)
|
| 153 |
-
- **Training data** is currently ~1,800 samples. Accuracy improves as the OpenA2A Registry accumulates more scan data
|
| 154 |
-
- **Context window** is 128 tokens. Very long documents are truncated
|
| 155 |
-
- **English only** -- not trained on non-English agent configurations
|
| 156 |
-
|
| 157 |
-
## Integration
|
| 158 |
-
|
| 159 |
-
NanoMind is used by three CLIs in the OpenA2A ecosystem:
|
| 160 |
-
|
| 161 |
-
| Tool | How NanoMind is Used |
|
| 162 |
-
|------|---------------------|
|
| 163 |
-
| [HackMyAgent](https://github.com/opena2a-org/hackmyagent) | Core semantic layer for all scan commands (secure, check, scan-soul, secure-openclaw, secure-nemoclaw) |
|
| 164 |
-
| [ai-trust](https://github.com/opena2a-org/ai-trust) | Deep trust verification of MCP servers and npm packages |
|
| 165 |
-
| [OpenA2A CLI](https://github.com/opena2a-org/opena2a) | Passes --deep flag through to HMA for semantic analysis |
|
| 166 |
|
| 167 |
## License
|
| 168 |
|
| 169 |
Apache-2.0. Free for commercial and non-commercial use.
|
| 170 |
|
| 171 |
-
## Links
|
| 172 |
-
|
| 173 |
-
- [HackMyAgent](https://github.com/opena2a-org/hackmyagent) -- 204-check security scanner
|
| 174 |
-
- [OpenA2A](https://opena2a.org) -- Open Agent-to-Agent protocol
|
| 175 |
-
- [OASB](https://oasb.org) -- Open Agent Security Benchmark
|
| 176 |
-
- [AgentPwn](https://agentpwn.com) -- AI agent attack honeypot
|
| 177 |
-
- [NanoMind Spec](https://nanomind.dev) -- Full specification
|
| 178 |
-
|
| 179 |
## Citation
|
| 180 |
|
| 181 |
```bibtex
|
|
@@ -183,7 +83,7 @@ Apache-2.0. Free for commercial and non-commercial use.
|
|
| 183 |
title = {NanoMind Security Classifier},
|
| 184 |
author = {OpenA2A},
|
| 185 |
url = {https://github.com/opena2a-org/nanomind},
|
| 186 |
-
version = {0.
|
| 187 |
year = {2026}
|
| 188 |
}
|
| 189 |
```
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- security
|
| 5 |
- ai-agents
|
|
|
|
| 7 |
- nanomind
|
| 8 |
- opena2a
|
| 9 |
- threat-detection
|
|
|
|
|
|
|
| 10 |
datasets:
|
| 11 |
- opena2a/nanomind-training
|
| 12 |
metrics:
|
| 13 |
- accuracy
|
| 14 |
- f1
|
|
|
|
| 15 |
model-index:
|
| 16 |
- name: nanomind-security-classifier
|
| 17 |
results:
|
|
|
|
| 21 |
metrics:
|
| 22 |
- name: Eval Accuracy
|
| 23 |
type: accuracy
|
| 24 |
+
value: 0.9673
|
| 25 |
---
|
| 26 |
|
| 27 |
+
# nanomind-security-classifier v0.4.0
|
| 28 |
|
| 29 |
+
Base 10-class threat classifier for AI agent security scanning
|
| 30 |
|
| 31 |
+
Part of the [OpenA2A](https://opena2a.org) security ecosystem.
|
| 32 |
+
Used by [HackMyAgent](https://github.com/opena2a-org/hackmyagent) for AI agent security scanning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## Metrics
|
| 35 |
|
| 36 |
| Metric | Value |
|
| 37 |
|--------|-------|
|
| 38 |
+
| Eval accuracy | 96.73% |
|
| 39 |
+
| Training samples | 3337 |
|
| 40 |
+
| Eval samples | 398 |
|
| 41 |
+
| Attack classes | 10 |
|
| 42 |
+
| Training corpus | sft-v9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
## Architecture
|
| 45 |
|
| 46 |
+
- **Type:** Mamba TME (8 blocks, d_model=128, d_state=64)
|
| 47 |
+
- **Inference:** ONNX (Node.js via onnxruntime-node) or NPZ weights
|
| 48 |
+
- **Latency:** Sub-2ms on CPU
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
## Attack Classes (10)
|
| 51 |
|
| 52 |
+
exfiltration, injection, privilege_escalation, persistence, credential_abuse, lateral_movement, social_engineering, policy_violation, benign
|
| 53 |
|
| 54 |
+
## Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
```bash
|
| 57 |
+
# Install HackMyAgent (includes NanoMind inference)
|
| 58 |
npm install -g hackmyagent
|
| 59 |
|
| 60 |
+
# Scan an MCP server or AI agent project
|
| 61 |
+
hackmyagent scan ./my-agent --deep
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
+
# Or use via OpenA2A CLI
|
| 64 |
+
npx opena2a scan ./my-agent
|
|
|
|
|
|
|
|
|
|
| 65 |
```
|
| 66 |
|
| 67 |
+
## Training
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
+
Trained on Apple Silicon (MLX) using curated security corpus from:
|
| 70 |
+
- [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) attack payloads
|
| 71 |
+
- [AgentPwn](https://agentpwn.com) honeypot captures
|
| 72 |
+
- [OASB](https://oasb.org) benchmark dataset
|
| 73 |
+
- OpenA2A Registry skill descriptions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
## License
|
| 76 |
|
| 77 |
Apache-2.0. Free for commercial and non-commercial use.
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
## Citation
|
| 80 |
|
| 81 |
```bibtex
|
|
|
|
| 83 |
title = {NanoMind Security Classifier},
|
| 84 |
author = {OpenA2A},
|
| 85 |
url = {https://github.com/opena2a-org/nanomind},
|
| 86 |
+
version = {0.4.0},
|
| 87 |
year = {2026}
|
| 88 |
}
|
| 89 |
```
|