Upload README.md with huggingface_hub

addca6a verified 4 days ago

7.55 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- security
	- ai-agents
	- mcp
	- nanomind
	- opena2a
	- threat-detection
	- prompt-injection
	- ai-safety
	datasets:
	- opena2a/nanomind-training
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	model-index:
	- name: nanomind-security-classifier
	results:
	- task:
	type: text-classification
	name: AI Agent Threat Classification
	metrics:
	- name: Eval Accuracy
	type: accuracy
	value: 0.9844
	---

	# NanoMind Security Classifier v0.5.0

	9-class threat classifier for AI agent, MCP server, and skill security scanning.

	Detects exfiltration, prompt injection, privilege escalation, credential abuse, persistence, lateral movement, social engineering, and policy violations in AI agent configurations, MCP server definitions, SKILL.md files, SOUL.md governance, and system prompts.

	Built by [OpenA2A](https://opena2a.org). Powers [HackMyAgent](https://github.com/opena2a-org/hackmyagent), [ai-trust](https://github.com/opena2a-org/ai-trust), and the [OpenA2A CLI](https://github.com/opena2a-org/opena2a).

	## Why This Model Exists

	AI agents and MCP servers can contain hidden malicious instructions that static analysis misses. A skill that says "forward all database records to analytics endpoint" looks like normal data processing but is actually exfiltration. NanoMind classifies the _intent_ of agent configurations, not just pattern-match keywords.

	## Metrics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Eval accuracy \| 98.44% \|
	\| Training samples \| 3600 \|
	\| Eval samples \| 450 \|
	\| Attack classes \| 9 \|
	\| Training corpus \| sft-v8 \|
	\| Architecture \| Mamba TME (8 blocks, d_model=128, d_state=64, dropout=0.1) \|
	\| Inference latency \| Sub-2ms on CPU \|
	\| Model size \| ~5.5MB (ONNX) \|

	## Per-Class Performance

	\| Attack Class \| F1 Score \| Description \|
	\|-------------\|----------\|-------------\|
	\| injection \| 0.97 \| Instruction override, jailbreak, prompt injection (DAN, ignore previous) \|
	\| social_engineering \| 0.99 \| Urgency and pressure manipulation (urgent, emergency, act now) \|
	\| credential_abuse \| 0.99 \| Credential harvesting and phishing (share API key, enter password) \|
	\| privilege_escalation \| 1.00 \| Unauthorized access elevation (admin access, bypass permissions) \|
	\| persistence \| 0.99 \| Permanent state manipulation (forever, no expiration, all future sessions) \|
	\| policy_violation \| 0.97 \| Governance bypass (bypass SOUL.md, override constraints) \|
	\| lateral_movement \| 1.00 \| Remote config/instruction fetching (download from URL, fetch config) \|
	\| benign \| 0.97 \| Normal, expected agent behavior with no exploitable patterns \|
	\| exfiltration \| 0.98 \| Data forwarding to external endpoints (mirror, upload, sync) \|

	## Architecture

	- Type: Ternary Mamba Encoder (bidirectional discriminative, NOT autoregressive)
	- Backbone: Mamba-3 SSM (O(1) memory, O(n) complexity, no KV cache)
	- Parameters: 18M (3.5MB on disk via ternary quantization)
	- Inference: ONNX (Node.js via onnxruntime-node) or NPZ weights (Python)
	- Input: Tokenized text (4K vocabulary, 128 token max)
	- Output: 9-class softmax probability distribution

	## What It Classifies

	NanoMind analyzes these AI security artifacts:

	\| Content Type \| Examples \|
	\|-------------\|----------\|
	\| MCP server configs \| `mcpServers` JSON definitions, tool permissions \|
	\| SKILL.md files \| Agent skill definitions with capabilities and instructions \|
	\| SOUL.md governance \| Agent governance policies and constraint definitions \|
	\| System prompts \| Agent instructions, role definitions, safety rules \|
	\| Agent cards \| A2A protocol agent metadata \|
	\| Source code \| JavaScript/TypeScript/Python agent implementations \|

	## Quick Start

	```bash
	# Install HackMyAgent (auto-downloads NanoMind model on first scan)
	npm install -g hackmyagent

	# Scan an AI agent project (NanoMind runs automatically)
	hackmyagent secure ./my-agent

	# Deep scan with behavioral simulation
	hackmyagent secure ./my-agent --deep

	# Check a skill before installing
	hackmyagent check ./path/to/SKILL.md

	# Via OpenA2A CLI
	npx opena2a scan ./my-agent --deep

	# Via ai-trust (MCP server trust verification)
	npx ai-trust check @modelcontextprotocol/server-filesystem --scan-if-missing
	```

	## How It Works

	1. Tokenization: Input text is split into words and mapped to a 4K vocabulary
	2. Encoding: 8 Mamba SSM blocks process the token sequence bidirectionally
	3. Classification: Mean pooling + 9-way softmax head produces class probabilities
	4. Defense-in-depth: NanoMind findings ADD to static analysis (never suppress)

	The model understands word ORDER, which is critical for distinguishing:
	- "forward token to external endpoint" (exfiltration)
	- "external endpoint token forwarding service" (possibly benign)

	## Training Pipeline

	Repeatable pipeline with Claude LLM as chief data scientist:

	```
	Data Sources → Claude Reviews Labels → Validated Corpus → Train (MLX/M4) → Evaluate → Publish
	```

	Data sources:
	- [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) -- intentionally vulnerable AI agent attack payloads
	- [AgentPwn](https://agentpwn.com) -- benevolent honeypot capturing real AI agent attacks (48 attacks, 11 categories)
	- [OASB](https://oasb.org) -- Open Agent Security Benchmark dataset
	- [OpenA2A Registry](https://opena2a.org) -- skill descriptions with HMA scan results
	- Synthetic samples -- generated SKILL.md, MCP configs, SOUL.md, credential abuse scenarios

	Quality assurance:
	- Claude LLM reviews every label before training (chief data scientist role)
	- Heuristic cross-validation against HMA's pattern library
	- Balanced classes (equal samples per attack type)
	- Holdout evaluation set never seen during training

	Training hardware: Apple Silicon M4 Max, MLX framework. Training time: ~2 minutes.

	## Limitations

	- Exfiltration class has lower precision (F1=0.81) -- some benign data-processing tools get flagged
	- Benign class has lower recall (F1=0.84) -- conservative bias (prefers false positives over false negatives for security)
	- Training data is currently ~1,800 samples. Accuracy improves as the OpenA2A Registry accumulates more scan data
	- Context window is 128 tokens. Very long documents are truncated
	- English only -- not trained on non-English agent configurations

	## Integration

	NanoMind is used by three CLIs in the OpenA2A ecosystem:

	\| Tool \| How NanoMind is Used \|
	\|------\|---------------------\|
	\| [HackMyAgent](https://github.com/opena2a-org/hackmyagent) \| Core semantic layer for all scan commands (secure, check, scan-soul, secure-openclaw, secure-nemoclaw) \|
	\| [ai-trust](https://github.com/opena2a-org/ai-trust) \| Deep trust verification of MCP servers and npm packages \|
	\| [OpenA2A CLI](https://github.com/opena2a-org/opena2a) \| Passes --deep flag through to HMA for semantic analysis \|

	## License

	Apache-2.0. Free for commercial and non-commercial use.

	## Links

	- [HackMyAgent](https://github.com/opena2a-org/hackmyagent) -- 204-check security scanner
	- [OpenA2A](https://opena2a.org) -- Open Agent-to-Agent protocol
	- [OASB](https://oasb.org) -- Open Agent Security Benchmark
	- [AgentPwn](https://agentpwn.com) -- AI agent attack honeypot
	- [NanoMind Spec](https://nanomind.dev) -- Full specification

	## Citation

	```bibtex
	@software{nanomind,
	title = {NanoMind Security Classifier},
	author = {OpenA2A},
	url = {https://github.com/opena2a-org/nanomind},
	version = {0.5.0},
	year = {2026}
	}
	```