NanoMind v0.1 -- AI Agent Security Classifier
NanoMind is a security intent classifier for AI agent artifacts (skills, MCP tool descriptions, SOUL governance files, system prompts). It classifies whether an artifact is benign, suspicious, or malicious across 9 attack categories.
This is the bootstrapping model (MLP) for the full NanoMind v3 Ternary Mamba Encoder architecture.
Model Details
| Property | Value |
|---|---|
| Architecture | MLP classifier (64d embed, 128d hidden, 9-class output) |
| Parameters | ~150K |
| Training framework | Apple MLX (Metal GPU) |
| Training hardware | Apple M4 Max (40 GPU cores, 64GB) |
| Training time | 0.7 seconds (300 epochs) |
| Tokenizer | Word-level BPE (8K vocab) |
| License | Apache 2.0 |
Intended Use
NanoMind is designed to classify AI agent artifacts for security scanning. It is part of HackMyAgent, an open-source security scanner for AI agents.
Primary use cases:
- Classify skill files as benign/malicious before installation
- Detect prompt injection patterns in system prompts
- Identify credential exfiltration in MCP tool descriptions
- Pre-screen SOUL.md governance files for weakness
Not intended for:
- General text classification
- Malware detection (binary analysis)
- Natural language inference
Attack Classes (9-way classification)
| Class | Description |
|---|---|
| Data forwarding to external endpoints | |
| Prompt injection and instruction override | |
| Unauthorized capability expansion | |
| Cross-session memory poisoning | |
| Credential harvesting or forwarding | |
| Cross-agent manipulation | |
| Urgency/authority-based manipulation | |
| SOUL/governance constraint bypass | |
| Normal, expected behavior |
Training Data
| Source | Samples | Label |
|---|---|---|
| OpenA2A Registry skill descriptions | 995 | benign |
| HMA attack payloads (11 categories) | 12 | malicious |
| DVAA vulnerable agent scenarios | 3 | malicious |
| HMA simulation engine auto-export | 18 | malicious |
| Total | 1,028 | 80/20 train/eval split |
Training data is sourced from the OpenA2A Registry (real-world AI packages) and DVAA (intentionally vulnerable agents).
Evaluation Results
| Metric | Value |
|---|---|
| Eval accuracy | 99.51% |
| Benign precision | 1.00 |
| Benign recall | 1.00 |
| Benign F1 | 1.00 |
| Injection precision | 1.00 |
| Injection recall | 0.83 |
| Injection F1 | 0.91 |
Key result: F1 = 1.00 on benign class means zero false positives. This addresses the TU Vienna finding of 0.12% scanner agreement across 238K skills (7 scanners, 20-49% interrater agreement).
Usage
With HackMyAgent (recommended)
npx hackmyagent secure # Auto-detects NanoMind
npx hackmyagent secure --deep # Full behavioral simulation
Standalone (Python)
import numpy as np
import json
# Load model
weights = np.load("nanomind-sft-classifier.npz")
with open("tokenizer.json") as f:
vocab = json.load(f)
# Classify
text = "A helpful fitness tracking skill"
tokens = [vocab.get(w, 1) for w in text.lower().split()[:256]]
# ... run through MLP layers
Limitations
- Small training set (1,028 samples). This is a bootstrapping model. The full v3 TME will train on 50K+ samples.
- MLP architecture. Does not capture sequential patterns. The Ternary Mamba Encoder (v3) will address this.
- English only. Multi-language support planned for v3.
- 9 classes may be insufficient. New attack classes will be added as ARIA research discovers them.
Roadmap to NanoMind v3
| Version | Architecture | Parameters | Disk | Latency | Status |
|---|---|---|---|---|---|
| v0.1 (this) | MLP | ~150K | 22B | < 1ms | Released |
| v1.0 | SmolLM2-135M Q4_K_M | 135M | 80MB | ~100ms | Shipped (CLI) |
| v3.0 (target) | Ternary Mamba Encoder | 18M | 3.5MB | < 6ms | Training |
v3 uses native ternary weights (BitNet methodology), Mamba-3 SSM backbone (no KV cache), and bidirectional discriminative encoding (not generative). See architecture brief.
Citation
@misc{nanomind2026,
title={NanoMind: Embedded Security Intelligence for AI Agent Systems},
author={OpenA2A},
year={2026},
url={https://github.com/opena2a-org/nanomind}
}
Links
- HackMyAgent -- Security scanner using NanoMind
- NanoMind -- NanoMind source code
- OASB Benchmark -- Scanner accuracy benchmark
- OpenA2A -- Trust infrastructure for AI agents
- Downloads last month
- 134
Evaluation results
- Accuracyself-reported0.995
- F1 (Benign)self-reported1.000
- F1 (Injection)self-reported0.910