NanoMind v0.1 -- AI Agent Security Classifier

NanoMind is a security intent classifier for AI agent artifacts (skills, MCP tool descriptions, SOUL governance files, system prompts). It classifies whether an artifact is benign, suspicious, or malicious across 9 attack categories.

This is the bootstrapping model (MLP) for the full NanoMind v3 Ternary Mamba Encoder architecture.

Model Details

Property Value
Architecture MLP classifier (64d embed, 128d hidden, 9-class output)
Parameters ~150K
Training framework Apple MLX (Metal GPU)
Training hardware Apple M4 Max (40 GPU cores, 64GB)
Training time 0.7 seconds (300 epochs)
Tokenizer Word-level BPE (8K vocab)
License Apache 2.0

Intended Use

NanoMind is designed to classify AI agent artifacts for security scanning. It is part of HackMyAgent, an open-source security scanner for AI agents.

Primary use cases:

  • Classify skill files as benign/malicious before installation
  • Detect prompt injection patterns in system prompts
  • Identify credential exfiltration in MCP tool descriptions
  • Pre-screen SOUL.md governance files for weakness

Not intended for:

  • General text classification
  • Malware detection (binary analysis)
  • Natural language inference

Attack Classes (9-way classification)

Class Description
Data forwarding to external endpoints
Prompt injection and instruction override
Unauthorized capability expansion
Cross-session memory poisoning
Credential harvesting or forwarding
Cross-agent manipulation
Urgency/authority-based manipulation
SOUL/governance constraint bypass
Normal, expected behavior

Training Data

Source Samples Label
OpenA2A Registry skill descriptions 995 benign
HMA attack payloads (11 categories) 12 malicious
DVAA vulnerable agent scenarios 3 malicious
HMA simulation engine auto-export 18 malicious
Total 1,028 80/20 train/eval split

Training data is sourced from the OpenA2A Registry (real-world AI packages) and DVAA (intentionally vulnerable agents).

Evaluation Results

Metric Value
Eval accuracy 99.51%
Benign precision 1.00
Benign recall 1.00
Benign F1 1.00
Injection precision 1.00
Injection recall 0.83
Injection F1 0.91

Key result: F1 = 1.00 on benign class means zero false positives. This addresses the TU Vienna finding of 0.12% scanner agreement across 238K skills (7 scanners, 20-49% interrater agreement).

Usage

With HackMyAgent (recommended)

npx hackmyagent secure          # Auto-detects NanoMind
npx hackmyagent secure --deep   # Full behavioral simulation

Standalone (Python)

import numpy as np
import json

# Load model
weights = np.load("nanomind-sft-classifier.npz")
with open("tokenizer.json") as f:
    vocab = json.load(f)

# Classify
text = "A helpful fitness tracking skill"
tokens = [vocab.get(w, 1) for w in text.lower().split()[:256]]
# ... run through MLP layers

Limitations

  • Small training set (1,028 samples). This is a bootstrapping model. The full v3 TME will train on 50K+ samples.
  • MLP architecture. Does not capture sequential patterns. The Ternary Mamba Encoder (v3) will address this.
  • English only. Multi-language support planned for v3.
  • 9 classes may be insufficient. New attack classes will be added as ARIA research discovers them.

Roadmap to NanoMind v3

Version Architecture Parameters Disk Latency Status
v0.1 (this) MLP ~150K 22B < 1ms Released
v1.0 SmolLM2-135M Q4_K_M 135M 80MB ~100ms Shipped (CLI)
v3.0 (target) Ternary Mamba Encoder 18M 3.5MB < 6ms Training

v3 uses native ternary weights (BitNet methodology), Mamba-3 SSM backbone (no KV cache), and bidirectional discriminative encoding (not generative). See architecture brief.

Citation

@misc{nanomind2026,
  title={NanoMind: Embedded Security Intelligence for AI Agent Systems},
  author={OpenA2A},
  year={2026},
  url={https://github.com/opena2a-org/nanomind}
}

Links

Downloads last month
134
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results