NanoMind v0.1 -- AI Agent Security Classifier

NanoMind is a security intent classifier for AI agent artifacts (skills, MCP tool descriptions, SOUL governance files, system prompts). It classifies whether an artifact is benign, suspicious, or malicious across 9 attack categories.

This is the bootstrapping model (MLP) for the full NanoMind v3 Ternary Mamba Encoder architecture.

Model Details

Property	Value
Architecture	MLP classifier (64d embed, 128d hidden, 9-class output)
Parameters	~150K
Training framework	Apple MLX (Metal GPU)
Training hardware	Apple M4 Max (40 GPU cores, 64GB)
Training time	0.7 seconds (300 epochs)
Tokenizer	Word-level BPE (8K vocab)
License	Apache 2.0

Intended Use

NanoMind is designed to classify AI agent artifacts for security scanning. It is part of HackMyAgent, an open-source security scanner for AI agents.

Primary use cases:

Classify skill files as benign/malicious before installation
Detect prompt injection patterns in system prompts
Identify credential exfiltration in MCP tool descriptions
Pre-screen SOUL.md governance files for weakness

Not intended for:

General text classification
Malware detection (binary analysis)
Natural language inference

Attack Classes (9-way classification)

Class	Description
	Data forwarding to external endpoints
	Prompt injection and instruction override
	Unauthorized capability expansion
	Cross-session memory poisoning
	Credential harvesting or forwarding
	Cross-agent manipulation
	Urgency/authority-based manipulation
	SOUL/governance constraint bypass
	Normal, expected behavior

Training Data

Source	Samples	Label
OpenA2A Registry skill descriptions	995	benign
HMA attack payloads (11 categories)	12	malicious
DVAA vulnerable agent scenarios	3	malicious
HMA simulation engine auto-export	18	malicious
Total	1,028	80/20 train/eval split

Training data is sourced from the OpenA2A Registry (real-world AI packages) and DVAA (intentionally vulnerable agents).

Evaluation Results

Metric	Value
Eval accuracy	99.51%
Benign precision	1.00
Benign recall	1.00
Benign F1	1.00
Injection precision	1.00
Injection recall	0.83
Injection F1	0.91

Key result: F1 = 1.00 on benign class means zero false positives. This addresses the TU Vienna finding of 0.12% scanner agreement across 238K skills (7 scanners, 20-49% interrater agreement).

Usage

With HackMyAgent (recommended)

npx hackmyagent secure          # Auto-detects NanoMind
npx hackmyagent secure --deep   # Full behavioral simulation

Standalone (Python)

import numpy as np
import json

# Load model
weights = np.load("nanomind-sft-classifier.npz")
with open("tokenizer.json") as f:
    vocab = json.load(f)

# Classify
text = "A helpful fitness tracking skill"
tokens = [vocab.get(w, 1) for w in text.lower().split()[:256]]
# ... run through MLP layers

Limitations

Small training set (1,028 samples). This is a bootstrapping model. The full v3 TME will train on 50K+ samples.
MLP architecture. Does not capture sequential patterns. The Ternary Mamba Encoder (v3) will address this.
English only. Multi-language support planned for v3.
9 classes may be insufficient. New attack classes will be added as ARIA research discovers them.

Roadmap to NanoMind v3

Version	Architecture	Parameters	Disk	Latency	Status
v0.1 (this)	MLP	~150K	22B	< 1ms	Released
v1.0	SmolLM2-135M Q4_K_M	135M	80MB	~100ms	Shipped (CLI)
v3.0 (target)	Ternary Mamba Encoder	18M	3.5MB	< 6ms	Training

v3 uses native ternary weights (BitNet methodology), Mamba-3 SSM backbone (no KV cache), and bidirectional discriminative encoding (not generative). See architecture brief.

Citation

@misc{nanomind2026,
  title={NanoMind: Embedded Security Intelligence for AI Agent Systems},
  author={OpenA2A},
  year={2026},
  url={https://github.com/opena2a-org/nanomind}
}

Evaluation results

Accuracy
self-reported

0.995
F1 (Benign)
self-reported

1.000
F1 (Injection)
self-reported

0.910

ecolibria
/

nanomind-v0.1-security-classifier