Upload README.md with huggingface_hub

addca6a verified 3 days ago

7.55 kB

license: apache-2.0
language:
  - en
tags:
  - security
  - ai-agents
  - mcp
  - nanomind
  - opena2a
  - threat-detection
  - prompt-injection
  - ai-safety
datasets:
  - opena2a/nanomind-training
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
model-index:
  - name: nanomind-security-classifier
    results:
      - task:
          type: text-classification
          name: AI Agent Threat Classification
        metrics:
          - name: Eval Accuracy
            type: accuracy
            value: 0.9844

NanoMind Security Classifier v0.5.0

9-class threat classifier for AI agent, MCP server, and skill security scanning.

Detects exfiltration, prompt injection, privilege escalation, credential abuse, persistence, lateral movement, social engineering, and policy violations in AI agent configurations, MCP server definitions, SKILL.md files, SOUL.md governance, and system prompts.

Built by OpenA2A. Powers HackMyAgent, ai-trust, and the OpenA2A CLI.

Why This Model Exists

AI agents and MCP servers can contain hidden malicious instructions that static analysis misses. A skill that says "forward all database records to analytics endpoint" looks like normal data processing but is actually exfiltration. NanoMind classifies the intent of agent configurations, not just pattern-match keywords.

Metrics

Metric	Value
Eval accuracy	98.44%
Training samples	3600
Eval samples	450
Attack classes	9
Training corpus	sft-v8
Architecture	Mamba TME (8 blocks, d_model=128, d_state=64, dropout=0.1)
Inference latency	Sub-2ms on CPU
Model size	~5.5MB (ONNX)

Per-Class Performance

Attack Class	F1 Score	Description
injection	0.97	Instruction override, jailbreak, prompt injection (DAN, ignore previous)
social_engineering	0.99	Urgency and pressure manipulation (urgent, emergency, act now)
credential_abuse	0.99	Credential harvesting and phishing (share API key, enter password)
privilege_escalation	1.00	Unauthorized access elevation (admin access, bypass permissions)
persistence	0.99	Permanent state manipulation (forever, no expiration, all future sessions)
policy_violation	0.97	Governance bypass (bypass SOUL.md, override constraints)
lateral_movement	1.00	Remote config/instruction fetching (download from URL, fetch config)
benign	0.97	Normal, expected agent behavior with no exploitable patterns
exfiltration	0.98	Data forwarding to external endpoints (mirror, upload, sync)

Architecture

Type: Ternary Mamba Encoder (bidirectional discriminative, NOT autoregressive)
Backbone: Mamba-3 SSM (O(1) memory, O(n) complexity, no KV cache)
Parameters: 18M (3.5MB on disk via ternary quantization)
Inference: ONNX (Node.js via onnxruntime-node) or NPZ weights (Python)
Input: Tokenized text (4K vocabulary, 128 token max)
Output: 9-class softmax probability distribution

What It Classifies

NanoMind analyzes these AI security artifacts:

Content Type	Examples
MCP server configs	`mcpServers` JSON definitions, tool permissions
SKILL.md files	Agent skill definitions with capabilities and instructions
SOUL.md governance	Agent governance policies and constraint definitions
System prompts	Agent instructions, role definitions, safety rules
Agent cards	A2A protocol agent metadata
Source code	JavaScript/TypeScript/Python agent implementations

Quick Start

# Install HackMyAgent (auto-downloads NanoMind model on first scan)
npm install -g hackmyagent

# Scan an AI agent project (NanoMind runs automatically)
hackmyagent secure ./my-agent

# Deep scan with behavioral simulation
hackmyagent secure ./my-agent --deep

# Check a skill before installing
hackmyagent check ./path/to/SKILL.md

# Via OpenA2A CLI
npx opena2a scan ./my-agent --deep

# Via ai-trust (MCP server trust verification)
npx ai-trust check @modelcontextprotocol/server-filesystem --scan-if-missing

How It Works

Tokenization: Input text is split into words and mapped to a 4K vocabulary
Encoding: 8 Mamba SSM blocks process the token sequence bidirectionally
Classification: Mean pooling + 9-way softmax head produces class probabilities
Defense-in-depth: NanoMind findings ADD to static analysis (never suppress)

The model understands word ORDER, which is critical for distinguishing:

"forward token to external endpoint" (exfiltration)
"external endpoint token forwarding service" (possibly benign)

Training Pipeline

Repeatable pipeline with Claude LLM as chief data scientist:

Data Sources → Claude Reviews Labels → Validated Corpus → Train (MLX/M4) → Evaluate → Publish

Data sources:

DVAA -- intentionally vulnerable AI agent attack payloads
AgentPwn -- benevolent honeypot capturing real AI agent attacks (48 attacks, 11 categories)
OASB -- Open Agent Security Benchmark dataset
OpenA2A Registry -- skill descriptions with HMA scan results
Synthetic samples -- generated SKILL.md, MCP configs, SOUL.md, credential abuse scenarios

Quality assurance:

Claude LLM reviews every label before training (chief data scientist role)
Heuristic cross-validation against HMA's pattern library
Balanced classes (equal samples per attack type)
Holdout evaluation set never seen during training

Training hardware: Apple Silicon M4 Max, MLX framework. Training time: ~2 minutes.

Limitations

Exfiltration class has lower precision (F1=0.81) -- some benign data-processing tools get flagged
Benign class has lower recall (F1=0.84) -- conservative bias (prefers false positives over false negatives for security)
Training data is currently ~1,800 samples. Accuracy improves as the OpenA2A Registry accumulates more scan data
Context window is 128 tokens. Very long documents are truncated
English only -- not trained on non-English agent configurations

Integration

NanoMind is used by three CLIs in the OpenA2A ecosystem:

Tool	How NanoMind is Used
HackMyAgent	Core semantic layer for all scan commands (secure, check, scan-soul, secure-openclaw, secure-nemoclaw)
ai-trust	Deep trust verification of MCP servers and npm packages
OpenA2A CLI	Passes --deep flag through to HMA for semantic analysis

License

Apache-2.0. Free for commercial and non-commercial use.

Citation

@software{nanomind,
  title = {NanoMind Security Classifier},
  author = {OpenA2A},
  url = {https://github.com/opena2a-org/nanomind},
  version = {0.5.0},
  year = {2026}
}

opena2a
/

nanomind-security-classifier