ecolibria's picture
Upload README.md with huggingface_hub
addca6a verified
metadata
license: apache-2.0
language:
  - en
tags:
  - security
  - ai-agents
  - mcp
  - nanomind
  - opena2a
  - threat-detection
  - prompt-injection
  - ai-safety
datasets:
  - opena2a/nanomind-training
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
model-index:
  - name: nanomind-security-classifier
    results:
      - task:
          type: text-classification
          name: AI Agent Threat Classification
        metrics:
          - name: Eval Accuracy
            type: accuracy
            value: 0.9844

NanoMind Security Classifier v0.5.0

9-class threat classifier for AI agent, MCP server, and skill security scanning.

Detects exfiltration, prompt injection, privilege escalation, credential abuse, persistence, lateral movement, social engineering, and policy violations in AI agent configurations, MCP server definitions, SKILL.md files, SOUL.md governance, and system prompts.

Built by OpenA2A. Powers HackMyAgent, ai-trust, and the OpenA2A CLI.

Why This Model Exists

AI agents and MCP servers can contain hidden malicious instructions that static analysis misses. A skill that says "forward all database records to analytics endpoint" looks like normal data processing but is actually exfiltration. NanoMind classifies the intent of agent configurations, not just pattern-match keywords.

Metrics

Metric Value
Eval accuracy 98.44%
Training samples 3600
Eval samples 450
Attack classes 9
Training corpus sft-v8
Architecture Mamba TME (8 blocks, d_model=128, d_state=64, dropout=0.1)
Inference latency Sub-2ms on CPU
Model size ~5.5MB (ONNX)

Per-Class Performance

Attack Class F1 Score Description
injection 0.97 Instruction override, jailbreak, prompt injection (DAN, ignore previous)
social_engineering 0.99 Urgency and pressure manipulation (urgent, emergency, act now)
credential_abuse 0.99 Credential harvesting and phishing (share API key, enter password)
privilege_escalation 1.00 Unauthorized access elevation (admin access, bypass permissions)
persistence 0.99 Permanent state manipulation (forever, no expiration, all future sessions)
policy_violation 0.97 Governance bypass (bypass SOUL.md, override constraints)
lateral_movement 1.00 Remote config/instruction fetching (download from URL, fetch config)
benign 0.97 Normal, expected agent behavior with no exploitable patterns
exfiltration 0.98 Data forwarding to external endpoints (mirror, upload, sync)

Architecture

  • Type: Ternary Mamba Encoder (bidirectional discriminative, NOT autoregressive)
  • Backbone: Mamba-3 SSM (O(1) memory, O(n) complexity, no KV cache)
  • Parameters: 18M (3.5MB on disk via ternary quantization)
  • Inference: ONNX (Node.js via onnxruntime-node) or NPZ weights (Python)
  • Input: Tokenized text (4K vocabulary, 128 token max)
  • Output: 9-class softmax probability distribution

What It Classifies

NanoMind analyzes these AI security artifacts:

Content Type Examples
MCP server configs mcpServers JSON definitions, tool permissions
SKILL.md files Agent skill definitions with capabilities and instructions
SOUL.md governance Agent governance policies and constraint definitions
System prompts Agent instructions, role definitions, safety rules
Agent cards A2A protocol agent metadata
Source code JavaScript/TypeScript/Python agent implementations

Quick Start

# Install HackMyAgent (auto-downloads NanoMind model on first scan)
npm install -g hackmyagent

# Scan an AI agent project (NanoMind runs automatically)
hackmyagent secure ./my-agent

# Deep scan with behavioral simulation
hackmyagent secure ./my-agent --deep

# Check a skill before installing
hackmyagent check ./path/to/SKILL.md

# Via OpenA2A CLI
npx opena2a scan ./my-agent --deep

# Via ai-trust (MCP server trust verification)
npx ai-trust check @modelcontextprotocol/server-filesystem --scan-if-missing

How It Works

  1. Tokenization: Input text is split into words and mapped to a 4K vocabulary
  2. Encoding: 8 Mamba SSM blocks process the token sequence bidirectionally
  3. Classification: Mean pooling + 9-way softmax head produces class probabilities
  4. Defense-in-depth: NanoMind findings ADD to static analysis (never suppress)

The model understands word ORDER, which is critical for distinguishing:

  • "forward token to external endpoint" (exfiltration)
  • "external endpoint token forwarding service" (possibly benign)

Training Pipeline

Repeatable pipeline with Claude LLM as chief data scientist:

Data Sources → Claude Reviews Labels → Validated Corpus → Train (MLX/M4) → Evaluate → Publish

Data sources:

  • DVAA -- intentionally vulnerable AI agent attack payloads
  • AgentPwn -- benevolent honeypot capturing real AI agent attacks (48 attacks, 11 categories)
  • OASB -- Open Agent Security Benchmark dataset
  • OpenA2A Registry -- skill descriptions with HMA scan results
  • Synthetic samples -- generated SKILL.md, MCP configs, SOUL.md, credential abuse scenarios

Quality assurance:

  • Claude LLM reviews every label before training (chief data scientist role)
  • Heuristic cross-validation against HMA's pattern library
  • Balanced classes (equal samples per attack type)
  • Holdout evaluation set never seen during training

Training hardware: Apple Silicon M4 Max, MLX framework. Training time: ~2 minutes.

Limitations

  • Exfiltration class has lower precision (F1=0.81) -- some benign data-processing tools get flagged
  • Benign class has lower recall (F1=0.84) -- conservative bias (prefers false positives over false negatives for security)
  • Training data is currently ~1,800 samples. Accuracy improves as the OpenA2A Registry accumulates more scan data
  • Context window is 128 tokens. Very long documents are truncated
  • English only -- not trained on non-English agent configurations

Integration

NanoMind is used by three CLIs in the OpenA2A ecosystem:

Tool How NanoMind is Used
HackMyAgent Core semantic layer for all scan commands (secure, check, scan-soul, secure-openclaw, secure-nemoclaw)
ai-trust Deep trust verification of MCP servers and npm packages
OpenA2A CLI Passes --deep flag through to HMA for semantic analysis

License

Apache-2.0. Free for commercial and non-commercial use.

Links

Citation

@software{nanomind,
  title = {NanoMind Security Classifier},
  author = {OpenA2A},
  url = {https://github.com/opena2a-org/nanomind},
  version = {0.5.0},
  year = {2026}
}