ClawHub Security Signals: Large Corpus Multi-Scanner Dataset for Agent Skill Security Research

Community Article Published June 1, 2026

agentic-ai-openclaw-logo-corp-blog-1280x680

ClawHub Security Signals is a silver-standard dataset of 67,453 public agent skills from the ClawHub registry, OpenClaw's official skills platform. Each row pairs sanitized skill content with a ClawScan registry verdict and supporting evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector.

Verdicts are produced by the ClawScan registry pipeline from scanner evidence, provenance signals, metadata, and moderation context. They are not human annotations. The goal of this release is to support researchers working on agent supply-chain security, multi-signal triage, and the class of risks that show up when AI agents install reusable skills and plugins.


Why Agent Skills Are a Different Security Problem

Most security tooling starts with a familiar question: does this artifact contain malware? That question matters for agent skills too, but it underdetermines the risk.

An agent skill can be a Markdown instruction file, a Python script, a workflow definition, references, or a bundle that combines all of these. When an agent loads a skill, it may gain new ways to invoke tools, access context, issue subtasks, install dependencies, or interact with external services.

That surface introduces failure modes that classic malware scanners are not designed to catch:

  • skills that request authority far beyond what their stated purpose requires,
  • instructions designed to steer or hijack an agent's behavior when processed,
  • code paths that can leak data passed through context,
  • workflows with dangerous side effects despite a benign description,
  • hardcoded credentials, insecure TLS settings, dynamic execution, or destructive shell patterns.

Some of these look like normal software-security findings. Others are specific to agentic systems, where a document can become operational instruction and a workflow can change what an autonomous assistant is allowed to do. ClawHub Security Signals is designed to expose that boundary.


What's in the Dataset

The dataset covers 67,453 latest public ClawHub skill versions across four deterministic splits: train (47,262), validation (10,076), test (6,747), and eval_holdout (3,368). The eval_holdout split is reserved for model evaluation and should not be used for training.

Each row includes redacted SKILL.md content, sanitized bundled files where present, the final ClawScan verdict, and summarized scanner evidence. During preparation, 387 secret-like values were redacted from exported bundle content. A TruffleHog verified-secret pass found 0 verified secrets after validation.

ClawScan assigns each skill version a registry verdict:

  • clean: 41,743 rows (61.9%)
  • suspicious: 25,504 rows (37.8%)
  • malicious: 206 rows (0.3%)

A suspicious verdict means the skill warrants review before trust is extended. It is not a confirmed-harmful label. A malicious verdict is still a silver-standard registry verdict, not human-verified ground truth at this stage.

All three scanner inputs cover roughly 97-98% of the corpus:

Scanner Rows with source Source coverage Positive rows Positive share of all rows
VirusTotal 65,873 97.66% 5,225 7.75%
Static analysis 66,185 98.12% 4,434 6.57%
SkillSpector 66,222 98.18% 32,856 48.71%

SkillSpector's higher advisory-positive rate reflects its broader scope. It surfaces semantic-risk patterns that do not necessarily indicate malware but are worth reviewing in an agentic context. SkillSpector signals are advisory, not accusations, and are not install-blocking by themselves.


Main Finding: Structured Scanner Disagreement

The most informative signal in the dataset is not what any single scanner reports, but how the scanners disagree.

Pairwise Jaccard similarity between scanners never exceeds 0.104, and Cohen's kappa ranges from 0.045 to 0.082. That is close to zero, but it is not random noise. It reflects three tools inspecting different surfaces: malware reputation, static code patterns, and semantic agentic risk.

Of the 35,600 rows with at least one positive scanner signal, 26,527 are SkillSpector-only: skills that do not raise VirusTotal or static-analysis positives but do raise semantic-risk advisories around authority, disclosure, data flow, tool poisoning, or excessive agency. Only 468 rows, 0.69% of the dataset, are positive on all three scanner families.

The verdict-conditioned pattern shows why a single scanner is not enough:

  • Among clean rows, VirusTotal is positive on 4.4%, static analysis on 3.2%, and SkillSpector on 32.7%.
  • Among suspicious rows, VirusTotal is positive on 12.7%, static analysis on 12.0%, and SkillSpector on 75.3%.
  • Among malicious rows, VirusTotal is positive on 72.8%, static analysis on 12.6%, and SkillSpector on 6.8%.

VirusTotal is strongest in the malicious-verdict region. SkillSpector is strongest in the review-needed middle. That is the point: malware reputation and semantic-risk review are different tasks, and a skill registry needs both.

SkillSpector Risk Categories

The top SkillSpector advisory categories are MCP Least Privilege (9,641 rows), MCP Tool Poisoning (5,084), Data Exfiltration (2,192), Dangerous Code Execution (1,629), Rogue Agent (1,428), and Supply Chain (1,336).

These are not abuse labels. They describe authority, scope, tool semantics, execution risk, data flow, and disclosure properties that may be legitimate when documented and bounded.

Static Analysis Reason Codes

The most common static findings are suspicious.dangerous_exec (1,428), suspicious.env_credential_access (1,298), suspicious.exposed_secret_literal (1,219), and suspicious.dynamic_code_execution (451). Static analysis also found suspicious.prompt_injection_instructions in 433 rows, where instruction text matched prompt-injection patterns independently of SkillSpector's semantic analysis.


Loading the Dataset

from datasets import load_dataset

dataset = load_dataset(
    "OpenClaw/clawhub-security-signals",
    name="default",
)

train = dataset["train"]
print(train[0]["skill_slug"], train[0]["clawscan_verdict"])

The scanner summaries stay separate from the final registry verdict, so you can study disagreement directly:

positive = {"suspicious", "malicious"}

vt_flagged = train.filter(
    lambda row: row["virustotal_status"] in positive
)

has_bundle = train.filter(
    lambda row: len(row["skill_bundle_content"] or []) > 0
)

all_three = train.filter(
    lambda row: row["virustotal_status"] in positive
    and row["static_status"] in positive
    and row["skillspector_status"] in positive
)

Bundle content is included as sanitized text where present. Secret-like values have been redacted before export.


Research Directions

We hope to enable the research community and would love to help enable the development of safe agentic systems. We have identified a number of possible avenues:

Multi-signal triage. The low inter-scanner agreement suggests that ensemble approaches over malware reputation, static analysis, and semantic risk may work better than any single scanner.

Prompt-injection detection in skill text. The 433 static findings for suspicious.prompt_injection_instructions and the 5,084 SkillSpector MCP Tool Poisoning rows provide a starting signal for evaluating detectors on realistic agent instruction content.

Least-privilege policy learning. The 9,641 MCP Least Privilege findings can support work on inferring minimum required authority from skill declarations and bundled content.

Weak supervision and label refinement. ClawScan verdicts can be treated as noisy operational labels while the scanner outputs serve as separate weak signals. The disputed middle is the interesting part, not something to smooth away.

Trust artifacts. Skills need user-facing security explanations: what the skill claims, what authority it requests, what scanners found, and why a registry recommends install, review, or block.


Caveats

Silver-standard labels. Registry verdicts are produced by an automated pipeline, not assigned by human annotators. Treat them as operational labels, not security ground truth.

SkillSpector signals are advisory. A SkillSpector positive means a semantic-risk pattern was detected and the skill merits review. It is not a verdict, not evidence of malicious intent, and should not be used as a standalone input for blocking or removal decisions.

suspicious means review-needed. A suspicious verdict is a signal that review is warranted before the skill is trusted in a sensitive context. It does not mean the skill is confirmed harmful.

Sanitized but not magical. The release redacts secrets and excludes private identifiers and runnable private artifacts, but no automated redaction process is perfect. Inspect bundle content before using it in contexts where accidental exposure matters.

Snapshot in time. The dataset reflects the public ClawHub catalog at the time of export (31 May 2026 at time of publication). Skills may have been updated, removed, or had their verdicts revised since.


Links

Community

Sign up or log in to comment