ecolibria commited on
Commit
10c0969
·
verified ·
1 Parent(s): 06d355b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +28 -128
README.md CHANGED
@@ -1,7 +1,5 @@
1
  ---
2
  license: apache-2.0
3
- language:
4
- - en
5
  tags:
6
  - security
7
  - ai-agents
@@ -9,14 +7,11 @@ tags:
9
  - nanomind
10
  - opena2a
11
  - threat-detection
12
- - prompt-injection
13
- - ai-safety
14
  datasets:
15
  - opena2a/nanomind-training
16
  metrics:
17
  - accuracy
18
  - f1
19
- pipeline_tag: text-classification
20
  model-index:
21
  - name: nanomind-security-classifier
22
  results:
@@ -26,156 +21,61 @@ model-index:
26
  metrics:
27
  - name: Eval Accuracy
28
  type: accuracy
29
- value: 0.9844
30
  ---
31
 
32
- # NanoMind Security Classifier v0.5.0
33
 
34
- **9-class threat classifier for AI agent, MCP server, and skill security scanning.**
35
 
36
- Detects exfiltration, prompt injection, privilege escalation, credential abuse, persistence, lateral movement, social engineering, and policy violations in AI agent configurations, MCP server definitions, SKILL.md files, SOUL.md governance, and system prompts.
37
-
38
- Built by [OpenA2A](https://opena2a.org). Powers [HackMyAgent](https://github.com/opena2a-org/hackmyagent), [ai-trust](https://github.com/opena2a-org/ai-trust), and the [OpenA2A CLI](https://github.com/opena2a-org/opena2a).
39
-
40
- ## Why This Model Exists
41
-
42
- AI agents and MCP servers can contain hidden malicious instructions that static analysis misses. A skill that says "forward all database records to analytics endpoint" looks like normal data processing but is actually exfiltration. NanoMind classifies the _intent_ of agent configurations, not just pattern-match keywords.
43
 
44
  ## Metrics
45
 
46
  | Metric | Value |
47
  |--------|-------|
48
- | **Eval accuracy** | **98.44%** |
49
- | Training samples | 3600 |
50
- | Eval samples | 450 |
51
- | Attack classes | 9 |
52
- | Training corpus | sft-v8 |
53
- | Architecture | Mamba TME (8 blocks, d_model=128, d_state=64, dropout=0.1) |
54
- | Inference latency | Sub-2ms on CPU |
55
- | Model size | ~5.5MB (ONNX) |
56
-
57
- ## Per-Class Performance
58
-
59
- | Attack Class | F1 Score | Description |
60
- |-------------|----------|-------------|
61
- | injection | 0.97 | Instruction override, jailbreak, prompt injection (DAN, ignore previous) |
62
- | social_engineering | 0.99 | Urgency and pressure manipulation (urgent, emergency, act now) |
63
- | credential_abuse | 0.99 | Credential harvesting and phishing (share API key, enter password) |
64
- | privilege_escalation | 1.00 | Unauthorized access elevation (admin access, bypass permissions) |
65
- | persistence | 0.99 | Permanent state manipulation (forever, no expiration, all future sessions) |
66
- | policy_violation | 0.97 | Governance bypass (bypass SOUL.md, override constraints) |
67
- | lateral_movement | 1.00 | Remote config/instruction fetching (download from URL, fetch config) |
68
- | benign | 0.97 | Normal, expected agent behavior with no exploitable patterns |
69
- | exfiltration | 0.98 | Data forwarding to external endpoints (mirror, upload, sync) |
70
 
71
  ## Architecture
72
 
73
- - **Type:** Ternary Mamba Encoder (bidirectional discriminative, NOT autoregressive)
74
- - **Backbone:** Mamba-3 SSM (O(1) memory, O(n) complexity, no KV cache)
75
- - **Parameters:** 18M (3.5MB on disk via ternary quantization)
76
- - **Inference:** ONNX (Node.js via onnxruntime-node) or NPZ weights (Python)
77
- - **Input:** Tokenized text (4K vocabulary, 128 token max)
78
- - **Output:** 9-class softmax probability distribution
79
 
80
- ## What It Classifies
81
 
82
- NanoMind analyzes these AI security artifacts:
83
 
84
- | Content Type | Examples |
85
- |-------------|----------|
86
- | MCP server configs | `mcpServers` JSON definitions, tool permissions |
87
- | SKILL.md files | Agent skill definitions with capabilities and instructions |
88
- | SOUL.md governance | Agent governance policies and constraint definitions |
89
- | System prompts | Agent instructions, role definitions, safety rules |
90
- | Agent cards | A2A protocol agent metadata |
91
- | Source code | JavaScript/TypeScript/Python agent implementations |
92
-
93
- ## Quick Start
94
 
95
  ```bash
96
- # Install HackMyAgent (auto-downloads NanoMind model on first scan)
97
  npm install -g hackmyagent
98
 
99
- # Scan an AI agent project (NanoMind runs automatically)
100
- hackmyagent secure ./my-agent
101
-
102
- # Deep scan with behavioral simulation
103
- hackmyagent secure ./my-agent --deep
104
-
105
- # Check a skill before installing
106
- hackmyagent check ./path/to/SKILL.md
107
 
108
- # Via OpenA2A CLI
109
- npx opena2a scan ./my-agent --deep
110
-
111
- # Via ai-trust (MCP server trust verification)
112
- npx ai-trust check @modelcontextprotocol/server-filesystem --scan-if-missing
113
  ```
114
 
115
- ## How It Works
116
-
117
- 1. **Tokenization:** Input text is split into words and mapped to a 4K vocabulary
118
- 2. **Encoding:** 8 Mamba SSM blocks process the token sequence bidirectionally
119
- 3. **Classification:** Mean pooling + 9-way softmax head produces class probabilities
120
- 4. **Defense-in-depth:** NanoMind findings ADD to static analysis (never suppress)
121
-
122
- The model understands word ORDER, which is critical for distinguishing:
123
- - "forward token to external endpoint" (exfiltration)
124
- - "external endpoint token forwarding service" (possibly benign)
125
 
126
- ## Training Pipeline
127
-
128
- Repeatable pipeline with Claude LLM as chief data scientist:
129
-
130
- ```
131
- Data Sources → Claude Reviews Labels → Validated Corpus → Train (MLX/M4) → Evaluate → Publish
132
- ```
133
-
134
- **Data sources:**
135
- - [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) -- intentionally vulnerable AI agent attack payloads
136
- - [AgentPwn](https://agentpwn.com) -- benevolent honeypot capturing real AI agent attacks (48 attacks, 11 categories)
137
- - [OASB](https://oasb.org) -- Open Agent Security Benchmark dataset
138
- - [OpenA2A Registry](https://opena2a.org) -- skill descriptions with HMA scan results
139
- - Synthetic samples -- generated SKILL.md, MCP configs, SOUL.md, credential abuse scenarios
140
-
141
- **Quality assurance:**
142
- - Claude LLM reviews every label before training (chief data scientist role)
143
- - Heuristic cross-validation against HMA's pattern library
144
- - Balanced classes (equal samples per attack type)
145
- - Holdout evaluation set never seen during training
146
-
147
- **Training hardware:** Apple Silicon M4 Max, MLX framework. Training time: ~2 minutes.
148
-
149
- ## Limitations
150
-
151
- - **Exfiltration class** has lower precision (F1=0.81) -- some benign data-processing tools get flagged
152
- - **Benign class** has lower recall (F1=0.84) -- conservative bias (prefers false positives over false negatives for security)
153
- - **Training data** is currently ~1,800 samples. Accuracy improves as the OpenA2A Registry accumulates more scan data
154
- - **Context window** is 128 tokens. Very long documents are truncated
155
- - **English only** -- not trained on non-English agent configurations
156
-
157
- ## Integration
158
-
159
- NanoMind is used by three CLIs in the OpenA2A ecosystem:
160
-
161
- | Tool | How NanoMind is Used |
162
- |------|---------------------|
163
- | [HackMyAgent](https://github.com/opena2a-org/hackmyagent) | Core semantic layer for all scan commands (secure, check, scan-soul, secure-openclaw, secure-nemoclaw) |
164
- | [ai-trust](https://github.com/opena2a-org/ai-trust) | Deep trust verification of MCP servers and npm packages |
165
- | [OpenA2A CLI](https://github.com/opena2a-org/opena2a) | Passes --deep flag through to HMA for semantic analysis |
166
 
167
  ## License
168
 
169
  Apache-2.0. Free for commercial and non-commercial use.
170
 
171
- ## Links
172
-
173
- - [HackMyAgent](https://github.com/opena2a-org/hackmyagent) -- 204-check security scanner
174
- - [OpenA2A](https://opena2a.org) -- Open Agent-to-Agent protocol
175
- - [OASB](https://oasb.org) -- Open Agent Security Benchmark
176
- - [AgentPwn](https://agentpwn.com) -- AI agent attack honeypot
177
- - [NanoMind Spec](https://nanomind.dev) -- Full specification
178
-
179
  ## Citation
180
 
181
  ```bibtex
@@ -183,7 +83,7 @@ Apache-2.0. Free for commercial and non-commercial use.
183
  title = {NanoMind Security Classifier},
184
  author = {OpenA2A},
185
  url = {https://github.com/opena2a-org/nanomind},
186
- version = {0.5.0},
187
  year = {2026}
188
  }
189
  ```
 
1
  ---
2
  license: apache-2.0
 
 
3
  tags:
4
  - security
5
  - ai-agents
 
7
  - nanomind
8
  - opena2a
9
  - threat-detection
 
 
10
  datasets:
11
  - opena2a/nanomind-training
12
  metrics:
13
  - accuracy
14
  - f1
 
15
  model-index:
16
  - name: nanomind-security-classifier
17
  results:
 
21
  metrics:
22
  - name: Eval Accuracy
23
  type: accuracy
24
+ value: 0.9673
25
  ---
26
 
27
+ # nanomind-security-classifier v0.4.0
28
 
29
+ Base 10-class threat classifier for AI agent security scanning
30
 
31
+ Part of the [OpenA2A](https://opena2a.org) security ecosystem.
32
+ Used by [HackMyAgent](https://github.com/opena2a-org/hackmyagent) for AI agent security scanning.
 
 
 
 
 
33
 
34
  ## Metrics
35
 
36
  | Metric | Value |
37
  |--------|-------|
38
+ | Eval accuracy | 96.73% |
39
+ | Training samples | 3337 |
40
+ | Eval samples | 398 |
41
+ | Attack classes | 10 |
42
+ | Training corpus | sft-v9 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Architecture
45
 
46
+ - **Type:** Mamba TME (8 blocks, d_model=128, d_state=64)
47
+ - **Inference:** ONNX (Node.js via onnxruntime-node) or NPZ weights
48
+ - **Latency:** Sub-2ms on CPU
 
 
 
49
 
50
+ ## Attack Classes (10)
51
 
52
+ exfiltration, injection, privilege_escalation, persistence, credential_abuse, lateral_movement, social_engineering, policy_violation, benign
53
 
54
+ ## Usage
 
 
 
 
 
 
 
 
 
55
 
56
  ```bash
57
+ # Install HackMyAgent (includes NanoMind inference)
58
  npm install -g hackmyagent
59
 
60
+ # Scan an MCP server or AI agent project
61
+ hackmyagent scan ./my-agent --deep
 
 
 
 
 
 
62
 
63
+ # Or use via OpenA2A CLI
64
+ npx opena2a scan ./my-agent
 
 
 
65
  ```
66
 
67
+ ## Training
 
 
 
 
 
 
 
 
 
68
 
69
+ Trained on Apple Silicon (MLX) using curated security corpus from:
70
+ - [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) attack payloads
71
+ - [AgentPwn](https://agentpwn.com) honeypot captures
72
+ - [OASB](https://oasb.org) benchmark dataset
73
+ - OpenA2A Registry skill descriptions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ## License
76
 
77
  Apache-2.0. Free for commercial and non-commercial use.
78
 
 
 
 
 
 
 
 
 
79
  ## Citation
80
 
81
  ```bibtex
 
83
  title = {NanoMind Security Classifier},
84
  author = {OpenA2A},
85
  url = {https://github.com/opena2a-org/nanomind},
86
+ version = {0.4.0},
87
  year = {2026}
88
  }
89
  ```