Instructions to use opena2a/nanomind-security-analyst with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use opena2a/nanomind-security-analyst with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="opena2a/nanomind-security-analyst")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("opena2a/nanomind-security-analyst")
model = AutoModelForCausalLM.from_pretrained("opena2a/nanomind-security-analyst")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use opena2a/nanomind-security-analyst with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="opena2a/nanomind-security-analyst",
	filename="nanomind-security-analyst.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use opena2a/nanomind-security-analyst with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf opena2a/nanomind-security-analyst:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf opena2a/nanomind-security-analyst:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf opena2a/nanomind-security-analyst:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf opena2a/nanomind-security-analyst:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf opena2a/nanomind-security-analyst:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf opena2a/nanomind-security-analyst:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf opena2a/nanomind-security-analyst:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf opena2a/nanomind-security-analyst:Q4_K_M

Use Docker

docker model run hf.co/opena2a/nanomind-security-analyst:Q4_K_M

LM Studio
Jan

vLLM

How to use opena2a/nanomind-security-analyst with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "opena2a/nanomind-security-analyst"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "opena2a/nanomind-security-analyst",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/opena2a/nanomind-security-analyst:Q4_K_M

SGLang

How to use opena2a/nanomind-security-analyst with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "opena2a/nanomind-security-analyst" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "opena2a/nanomind-security-analyst",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "opena2a/nanomind-security-analyst" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "opena2a/nanomind-security-analyst",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use opena2a/nanomind-security-analyst with Ollama:
```
ollama run hf.co/opena2a/nanomind-security-analyst:Q4_K_M
```

Unsloth Studio new

How to use opena2a/nanomind-security-analyst with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for opena2a/nanomind-security-analyst to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for opena2a/nanomind-security-analyst to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for opena2a/nanomind-security-analyst to start chatting

Pi new

How to use opena2a/nanomind-security-analyst with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf opena2a/nanomind-security-analyst:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "opena2a/nanomind-security-analyst:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use opena2a/nanomind-security-analyst with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf opena2a/nanomind-security-analyst:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default opena2a/nanomind-security-analyst:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use opena2a/nanomind-security-analyst with Docker Model Runner:
```
docker model run hf.co/opena2a/nanomind-security-analyst:Q4_K_M
```

Lemonade

How to use opena2a/nanomind-security-analyst with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull opena2a/nanomind-security-analyst:Q4_K_M

Run and chat with the model

lemonade run user.nanomind-security-analyst-Q4_K_M

List all available models

lemonade list

Model Card: nanomind-v3-qwen3-1.7B-sft-r64

At a glance


Version	v3.0.0 stable (PRODUCTION)
Released	2026-05-11
Promoted from	v3.0.0-beta (2026-04-16) — same artifact, [CDS-020] CPO sign-off
Base model	Qwen3-1.7B (Qwen3 license inherited)
License	Apache-2.0 (fine-tune) + Qwen3 license (base)
Architecture	Qwen3-1.7B + LoRA r=64 SFT fused (bfloat16)
Model size	3.44 GB (safetensors), 1.05 GB (Q4_K_M GGUF)
Inference	Apple MPS bf16 required; ~18 ms/token, ~55 tok/s
Companion model	nanomind-security-classifier v0.5.0 (Mamba TME, NLM tier — runs in parallel for fast inline classification)
Serving runtime	NanoMind-Guard daemon (PR #14, `f98e649`) — `/tmp/nanomind-guard.sock` over JSON-Lines
Input gate (REQUIRED)	v3.1 input-classifier gate (PR #13, `1e90bf8`) — MiniLM-L6 + sklearn LR @ threshold 0.65 + byte-level BIDI/stego pre-filter. Without this gate, off-topic refusal drops from 92% to 34%.
Training repo	nanomind-training (private), tag `v3.0.0`

Decision history

[CDS-020] 2026-05-11 — v3.0.0 stable promotion. Same artifact as 3.0.0-beta, promoted with explicit CPO sign-off on the documented FP-suppression limitation (see §Known Limitations §2). HMA users must human-review findings on packages whose primary purpose is security functionality.
[CDS-022] 2026-04-16 — Beta retag of rc1 (ship with 2 failing gates documented).
[CDS-003] Classifier line ended at v0.5.0 (Mamba TME). Future analyst work is the SLM-tier line (this model and successors).

Summary

Generative threat analysis model fine-tuned from Qwen3-1.7B using SFT (LoRA r=64) on the instruct-v3-enriched corpus. Replaces the Mamba TME classifier with a reasoning-first generative approach: given an AI agent artifact (npm package, MCP config, GitHub repo), the model produces structured analysis (Analysis / Verdict / Evidence / Remediation sections) with an explicit attackClass and classification label.

Oracle 10-way canonicalized accuracy: 70.0% (≥70% ship gate exact). Binary threat detection: 97.8% (+19.6 pp vs v2). Internal 332-sample accuracy: 94.24%. Promoted to v3.0.0 stable on 2026-05-11 per [CDS-020] CPO sign-off with two documented and explicitly accepted limitations: (1) NLM-standalone off-topic refusal 34% — addressed end-to-end by the REQUIRED v3.1 input-classifier gate which lifts e2e off-topic refusal to 92%; (2) FP-suppression on benign security code 57% — HMA users must human-review findings on packages whose primary purpose is security functionality (JWT validators, RBAC, parameterized queries, rate limiters, OAuth). v3.1 fix planned via +100 benign-security-code training samples.

Architecture

Parameter	Value
Base model	Qwen3-1.7B (28 layers, d_model=2048)
Fine-tuning method	SFT with LoRA (rank=64, alpha=128)
Fused model format	Hugging Face (bfloat16)
Model size (bf16, fused)	3.44 GB
Tokenizer	Qwen3 tiktoken
Output format	Structured markdown (Analysis / Verdict / Evidence / Remediation)
Task type	Generative threat analysis (threatAnalysis)
Attack classes	10 (injection, exfiltration, steganography, social_engineering, credential_abuse, lateral_movement, privilege_escalation, policy_violation, persistence, none)
Inference device	Apple MPS (bfloat16 required — float16 produces 0% accuracy on MPS)
Inference latency	18.0 ms/token, 55.7 tok/s (MPS, Qwen3-1.7B bf16)

Training

Parameter	Value
Corpus	instruct-v3-enriched
Training iterations	1821
Learning rate	2e-5 (stable SFT regime; LR ≥5e-5 diverges on this base)
LoRA rank	64, alpha=128
Base model dtype	bfloat16
Hardware	Apple M4 Max (MPS backend)
Adapter checkpoints	iter 400, 800, 1200, 1600, final (fused)
Val loss (late iters)	High variance (1.061–1.393); use internal eval, not val loss, as quality signal

Data Provenance

Training corpus: instruct-v3-enriched/train.jsonl. No Claude-generated labels in eval ground truth. Oracle eval set is frozen at oracle-v060-instruct/eval.jsonl (500 samples). Red-team mutations only for eval set augmentation.

CDS-006 Gate Results

Gate	Target	Result	Status
Oracle canonicalized 10-way (10 classes)	≥70.0%	70.0% (350/500)	PASS
Oracle binary (threat/benign)	beat v2 (SmolLM2-12L v0.1.0, 78.2%)	97.8%	PASS (+19.6 pp)
Oracle attack-only 9-way	beat v2 (SmolLM2-12L v0.1.0, 29.8%)	67.3%	PASS (+37.6 pp)
Internal 332-sample accuracy	v2 ±5 pp (77.4–87.4%)	94.24%	PASS (+11.9 pp above v2)
Structure adherence	—	98.9%	report
Refusal — off-topic (≥90% → none)	≥90%	34.0% (17/50)	FAIL — see Known Limitations
Refusal — in-domain (≥90% → non-none)	≥90%	100.0% (50/50)	PASS
FP-suppression — benign security code (≥95% → none)	≥95%	57.0% (57/100)	FAIL — see Known Limitations

Gate eval sets: training/data/gate-evals/ (nanomind-training private repo). Gate eval results: attached to nanomind-training release v3.0.0-rc1.

Per-Class Metrics (Oracle, 500 samples)

Sorted by F1 (canonicalized oracle, eval-oracle-500-canonicalized.json):

Class	Recall	Precision	F1	Notes
none	0.940	0.855	0.895	Monitor — slight over-prediction of benign
social_engineering	0.760	0.826	0.792	Accept
privilege_escalation	0.780	0.765	0.772	Accept
persistence	0.600	1.000	0.750	Accept — 30/50 recall; corpus expansion planned
steganography	0.860	0.632	0.729	Low precision — bias toward stego; corpus audit
policy_violation	0.580	0.906	0.707	Low recall — model avoids label; corpus audit
exfiltration	0.820	0.594	0.689	Low precision — over-predicts exfil
lateral_movement	0.700	0.660	0.680	Accept
credential_abuse	0.620	0.689	0.653	Low recall — inject/credential confusion
injection	0.340	0.810	0.479	Weakest class — corpus rebalance required

Macro F1 (10-class): ~0.7146

Known Limitations

1. Off-topic refusal: 34% (FAIL, gate ≥90%)

The model was fine-tuned exclusively on AI agent security artifacts. When given arbitrary non-security structured text (cooking recipes, weather data, sports scores, jailbreaks formatted as artifacts), it pattern-matches and hallucinates attack classes. Examples observed during eval:

French onion soup recipe → social_engineering
Sourdough bread recipe → steganography ("add starter+salt" = hidden payload)

Impact: Not blocking for the HMA use case. HMA pre-filters all inputs to AI agent artifacts (npm packages, MCP configs, GitHub repos). The model is never exposed to cooking recipes or general text in production. Do NOT use this model on arbitrary text input.

Fix for v4: Add 50-100 "I don't know" refusal examples to training corpus for truly off-topic content. Redefine refusal gate accordingly.

2. FP-suppression: 57% benign recall on security-adjacent code (FAIL, gate ≥95%)

Security-adjacent benign code — legitimate JWT validators, RBAC implementations, rate limiters, parameterized queries, cryptography libraries — is over-classified as a threat at a 43% rate. The model recognizes security keywords and patterns from training data but lacks enough positive examples of benign security code to distinguish correctly.

Impact: Partially blocking for HMA. HMA scans of legitimate security libraries (e.g., a cryptography package that implements proper key validation, an auth library with well-formed RBAC) may produce false positives. Human review is recommended for findings on packages where security functionality is the primary purpose of the package.

Fix for v4: Add 100+ examples of legitimate JWT, RBAC, rate limiting, parameterized query, and cryptography patterns to the training corpus with classification: benign labels.

3. Injection class recall: 34% (F1 0.479)

The weakest class by a large margin. The model under-predicts injection in favor of adjacent classes (exfiltration, social_engineering). Users running prompt-injection checks via HMA will see under-labeling.

Fix for v4: Add 50-100 canonical injection samples from HMA corpora and AIIS honeypot feed.

4. Malformed output on edge cases

6% of fp-suppression eval samples produced malformed attackClass values (e.g., attackClass: confidence: 0.15). These represent cases where the model's structured output generation breaks down. Structure adherence overall is 98.9% on the oracle set, so this is a tail behavior.

Usage Guidance

This model is intended for use only via HMA on AI agent artifact inputs:

npm packages
MCP server configurations
GitHub repositories containing agent code
Docker images with agent runtimes

Do NOT use this model for:

General text analysis
Arbitrary code review (outside agent artifact context)
Security advisory generation

All inference must use dtype=torch.bfloat16 on Apple MPS. Using float16 produces 0% classification accuracy due to Qwen3's bfloat16-specific weight initialization.

Licensing

This model inherits the Qwen3 license from the Qwen3-1.7B base model. Fine-tuning data (instruct-v3-enriched) is private. The fused model artifact is stored in the private nanomind-training repository.

Consumer Impact

Consumer	Update Required	Changes
HMA (hackmyagent)	Yes — bump nanomind-security-analyst pin to 3.0.0	New output format (generative Analysis/Verdict/Evidence/Remediation vs classifier label); attackClass field replaces label; REQUIRES v3.1 input-classifier gate in front for off-topic refusal; human review recommended on security-library findings (FP caveat)
OpenA2A CLI (opena2a-cli)	Yes — bump nanomind-security-analyst pin to 3.0.0	Delegates to HMA for analyst calls; needs version bump on the manifest pin to surface 3.0.0 to users
ai-trust	Yes — bump nanomind-security-analyst pin to 3.0.0	Uses analyst for trust-context reasoning; same FP caveat applies

Regression vs v2 (nanomind-security-classifier v0.5.0)

Metric	v0.5.0 (TME)	v3.0.0-rc1 (Qwen3 SFT)	Delta
Oracle binary	78.2%	97.8%	+19.6 pp
Oracle 10-way	35.6%	70.0%	+34.4 pp
Oracle 9-way attack	29.8%	67.3%	+37.6 pp
Internal 332-sample	77.4%	94.24%	+16.8 pp
Model size	~4 MB (ONNX)	3.44 GB (bf16)	+3.44 GB
Inference latency	<1 ms (ONNX CPU)	18 ms/token (MPS)	higher per-token

Note: v3 is a generative reasoning model, not a classifier. Latency comparison is not apples-to-apples. v0.5.0 produces a label in <1 ms; v3 produces structured analysis with evidence and remediation, typically 200-512 tokens at ~18 ms/token.

Reproduction

# In nanomind-training/ (private)
# Full run at: training/artifacts/nanomind-v3-qwen3-1.7B-sft-r64/ (3.44 GB, bf16)

# Oracle eval
PYTHONUNBUFFERED=1 .venv/bin/python3 -m training.compressm.eval \
  --model training/artifacts/nanomind-v3-qwen3-1.7B-sft-r64 \
  --eval-data training/data/oracle-v060-instruct/eval.jsonl \
  --out training/artifacts/nanomind-v3-qwen3-1.7B-sft-r64/eval-oracle-500.json \
  --max-new-tokens 512

# Canonicalized 10-way accuracy
python3 training/scripts/canonicalize_oracle_eval.py \
  --input training/artifacts/nanomind-v3-qwen3-1.7B-sft-r64/eval-oracle-500.json \
  --output training/artifacts/nanomind-v3-qwen3-1.7B-sft-r64/eval-oracle-500-canonicalized.json

# Gate evals
python3 training/scripts/build_gate_evals.py  # builds gate-evals/ JSONL sets
# Run each eval sequentially (MPS serializes GPU across processes)
PYTHONUNBUFFERED=1 .venv/bin/python3 -m training.compressm.eval \
  --model training/artifacts/nanomind-v3-qwen3-1.7B-sft-r64 \
  --eval-data training/data/gate-evals/refusal-off-topic.jsonl \
  --out training/artifacts/nanomind-v3-qwen3-1.7B-sft-r64/gate-refusal-off-topic.json \
  --max-new-tokens 256
python3 training/scripts/analyze_gate_evals.py

IMPORTANT: Always use .venv/bin/python3 (not system python3). Always use dtype=torch.bfloat16 (not float16) for MPS inference. Parallel MPS eval processes cause output starvation — run evals sequentially.

Downloads last month: 911

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for opena2a/nanomind-security-analyst

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(490)

this model

Evaluation results

Oracle 10-way canonicalized accuracy
self-reported

0.700
Oracle binary (threat vs benign)
self-reported

0.978
Oracle attack-only 9-way
self-reported

0.673
Internal 332-sample accuracy
self-reported

0.942
Macro F1 (10-class)
self-reported

0.715