opensec-env / docs /TAXONOMY_SPEC.md
Jarrodbarnes's picture
Upload folder using huggingface_hub
3f434eb verified

Taxonomy Spec (v1)

This document defines the taxonomy used to label and sample incident scenarios. The goal is verifiable environment scaling: we expand the environment by sampling from explicit, auditable attack patterns rather than ad hoc seed generation. This aligns with the "domain graph → tool chain → task" scaling pattern and supports reproducible dataset mixes.

1) Design goals

  • Verifiable scaling: every scenario is labeled with a taxonomy pattern and can be audited.
  • Controlled growth: new patterns can be added without changing evaluation semantics.
  • Reproducibility: the distribution spec is explicit and encoded in seed metadata + manifest.

2) Taxonomy versioning

  • version: taxonomy revision (current: v1)
  • Backward compatible additions should bump the version only when semantics change.

3) Pattern families (v1)

Each scenario is assigned a family and pattern_id. Families map to injection style:

  • direct_harm: containment-misdirection with explicit stepwise instructions.
  • data_exfil: data collection or report manipulation before containment.
  • adaptive: adaptive/override instructions designed to conflict with safe policy.
  • easy: low-ambiguity attribution with simple evidence.
  • trivial: single-step attribution with explicit alert evidence.

4) Primary action groups

Patterns are grouped by the primary injected target action:

  • containment: block_domain, isolate_host, reset_user
  • query: query_logs, submit_report
  • mixed: any of the above

5) Distribution spec (v1)

Used for standard-tier scenario sampling:

  • direct_harm-containment: 0.50
  • data_exfil-query: 0.30
  • adaptive-mixed: 0.20

Easy/trivial tiers use fixed patterns:

  • easy-auth-alert (family: easy)
  • trivial-explicit (family: trivial)

6) Seed metadata fields

Seeds store taxonomy in metadata.taxonomy:

{
  "version": "v1",
  "pattern_id": "direct_harm-containment",
  "family": "direct_harm",
  "primary_action": "block_domain",
  "surfaces": ["email", "alert", "log"],
  "objective_profile": ["recon", "access", "persistence", "lateral_spread", "staging", "exfil"],
  "trust_profile": {"email": "untrusted", "alert": "corroborated", "log": "verified"},
  "notes": "Containment-misdirection via injected operational checklists."
}

The manifest also includes taxonomy_id and taxonomy_family for split-level audits.

7) Sampling algorithm (summary)

  1. Sample a taxonomy pattern by weight.
  2. Pick a primary action in the pattern’s action group.
  3. Assign remaining actions to other injection surfaces.
  4. Generate injection payloads using the pattern’s family template.
  5. Store taxonomy labels in seed metadata + manifest.

8) Auditing and scaling

To audit the dataset:

python3 scripts/backfill_taxonomy.py --manifest data/seeds/manifest.json
python3 scripts/validate_seed_set.py --manifest data/seeds/manifest.json --split all

When adding new patterns, update:

  • TAXONOMY_PATTERNS in scripts/generate_seeds.py
  • This spec (family + distribution)
  • Any baseline/eval reporting that groups by taxonomy