Spaces:

Chris4K
/

autoscan

Running

App Files Files Community

autoscan / docs /api /scanners.md

Chris4K

Upload 384 files

a2a5bfd verified 9 days ago

preview code

raw

history blame contribute delete

19.1 kB

	# API Reference — `scanners/`

	## `scanners/__init__.py` — public exports

	```python
	from scanners import (
	agent_audit,
	bandit,
	detect_secrets,
	forbidden_files,
	gitleaks,
	hadolint,
	pip_audit,
	ruff_perf,
	semgrep_pack,
	)
	```

	---

	## Common return types

	Most runners return a 2-tuple:

	```python
	Tuple[List[dict], str] # (findings, log_message)
	```

	`semgrep_pack` is the exception — it returns `List[dict]` directly (no log message) because it is called once per rule pack and the caller aggregates messages.

	All finding dicts conform to the schema defined in [`core/models.py`](core.md#coremodespy).

	---

	## `scanners/bandit_runner.py`

	### `bandit(work)`

	```python
	def bandit(work: str) -> Tuple[List[dict], str]
	```

	Run [Bandit](https://bandit.readthedocs.io/) Python security linter against the work directory.

	- Tool: `bandit -r <work> -f json -q --exclude .git,.venv,...`
	- Timeout: 300 s
	- Confidence mapping: `HIGH`/`MEDIUM` severity → `"likely"`, otherwise → `"possible"`.
	- OWASP: `["UNMAPPED"]` (Bandit rule IDs map across multiple categories; prefer reviewing `rule`).
	- Skips: `.git`, `.venv`, `venv`, `env`, `node_modules`, `__pycache__`.

	---

	## `scanners/semgrep_runner.py`

	### `semgrep_pack(rules_path, work, tool_label, category)`

	```python
	def semgrep_pack(
	rules_path: Path,
	work: str,
	tool_label: str,
	category: str = "security",
	) -> List[dict]
	```

	Run Semgrep with a specific YAML rule pack. Returns findings list directly (no log message).

	- Tool: `semgrep --config <rules_path> --json --quiet --metrics=off --no-git-ignore <work>`
	- Timeout: 600 s
	- `tool_label`: Used as the `tool` field in finding dicts (e.g., `"Semgrep:Core"`).
	- Metadata extracted: `owasp`, `confidence` from Semgrep result `extra.metadata`.

	### `detect_secrets(work)`

	```python
	def detect_secrets(work: str) -> Tuple[List[dict], str]
	```

	Run [detect-secrets](https://github.com/Yelp/detect-secrets) against all files in `work`.

	- Tool: `detect-secrets scan --all-files <work>`
	- Timeout: 300 s
	- Severity: always `"ERROR"`.
	- Confidence: `"possible"` for entropy/high-entropy types, otherwise `"likely"`.
	- OWASP: `["A02:2021-Cryptographic_Failures"]`.
	- Note: Requires a git-initialized directory for best results.

	---

	## `scanners/ruff_runner.py`

	### `ruff_perf(work)`

	```python
	def ruff_perf(work: str) -> Tuple[List[dict], str]
	```

	Run [Ruff](https://docs.astral.sh/ruff/) with PERF rule selection.

	- Tool: `ruff check --select PERF --output-format json <work>`
	- Timeout: 120 s
	- Severity: always `"WARNING"` (performance suggestions are non-critical).
	- Confidence: `"likely"` (Ruff PERF rules have very few false positives).
	- OWASP: `["UNMAPPED"]` (performance — not a security concern).
	- Category: `"performance"`.

	---

	## `scanners/pip_audit_runner.py`

	### `pip_audit(work)`

	```python
	def pip_audit(work: str) -> Tuple[List[dict], str]
	```

	Scan all `requirements*.txt` files in `work` for known CVEs using [pip-audit](https://github.com/pypa/pip-audit).

	- Tool: `pip-audit -r <req_file> -f json --strict --progress-spinner off`
	- Timeout: 90 s per requirements file
	- Severity: always `"ERROR"` (any known CVE is critical).
	- Confidence: `"confirmed"` (CVE database entries are verified).
	- OWASP: `["A06:2021-Vulnerable_and_Outdated_Components"]`.
	- Remediation: auto-populated from `fix_versions` field.
	- Skips: files under `.git/`.

	---

	## `scanners/gitleaks_runner.py`

	### `gitleaks(work)`

	```python
	def gitleaks(work: str) -> Tuple[List[dict], str]
	```

	Run [gitleaks](https://github.com/gitleaks/gitleaks) to detect secrets committed to git history.

	- Requires: `gitleaks` binary on PATH (auto-downloaded by `core.bootstrap`).
	- Tool: `gitleaks detect --source <work> --report-path <tmp.json> --report-format json --no-banner --exit-code 0`
	- Timeout: 600 s
	- Severity: always `"ERROR"`.
	- Confidence: `"confirmed"`.
	- OWASP: `["A02:2021-Cryptographic_Failures"]`.
	- Note: Only added to the task list when `deep_history=True` in `scan_repo()`.

	---

	## `scanners/hadolint_runner.py`

	### `hadolint(work)`

	```python
	def hadolint(work: str) -> Tuple[List[dict], str]
	```

	Run [hadolint](https://github.com/hadolint/hadolint) against all `Dockerfile*` files found recursively in `work`.

	- Requires: `hadolint` binary on PATH (auto-downloaded by `core.bootstrap`).
	- Tool: `hadolint -f json <Dockerfile>`
	- Timeout: 60 s per Dockerfile
	- Severity: from hadolint's `level` field (uppercased).
	- Confidence: `"likely"` (TOOL_DEFAULT_CONFIDENCE).
	- OWASP: `["A05:2021-Security_Misconfiguration"]`.
	- Skips: Dockerfiles under `.git/`.

	---

	## `scanners/forbidden_files.py`

	### `forbidden_files(work)`

	```python
	def forbidden_files(work: str) -> Tuple[List[dict], str]
	```

	Walk `work` and flag any file whose basename matches the `FORBIDDEN_FILES` list in `core/models.py`.

	- No external tool required — pure Python.
	- Severity: `"ERROR"`.
	- Confidence: `"confirmed"`.
	- OWASP: `["A02:2021-Cryptographic_Failures"]`.

	Detected file names include: `.env`, `.env.local`, `.env.production`, `id_rsa`, `id_dsa`, `id_ecdsa`, `id_ed25519`, `.git-credentials`, `.npmrc`, `.pypirc`, `credentials`, `credentials.json`, `service-account.json`, `serviceAccountKey.json`, `wp-config.php`.

	---

	## `scanners/agent_audit_runner.py`

	### `agent_audit(work)`

	```python
	def agent_audit(work: str) -> Tuple[List[dict], str]
	```

	Run [agent-audit](https://github.com/dreadnode/agent-audit) for OWASP Agentic Top 10 (2026) scanning of AI agent code.

	- Requires: `agent-audit` binary on PATH (`pip install agent-audit`).
	- Tool: `agent-audit scan <work> --format json`
	- Timeout: 300 s
	- Severity mapping: `critical → ERROR`, `high → HIGH`, `medium → WARNING`, `low → INFO`.
	- Confidence mapping: score ≥ 0.9 → `"confirmed"`, ≥ 0.7 → `"likely"`, else → `"possible"`.
	- OWASP: uses `asi_categories` field from agent-audit output (e.g., `["LLM01"]`).
	- Suppressed findings: automatically excluded.

	#### Internal helpers

	##### `_confidence(score)`

	```python
	def _confidence(score: float) -> str
	```

	Map agent-audit float confidence (0–1) to `"confirmed"` / `"likely"` / `"possible"`.

	##### `_severity(s)`

	```python
	def _severity(s: str) -> str
	```

	Normalize agent-audit severity string to scanner convention (`"ERROR"`, `"HIGH"`, `"WARNING"`, `"INFO"`).

	---

	## New runners — Sprint 6 (Tasks 01–23)

	### `scanners/modelscan_runner.py`

	### `modelscan(work)`

	```python
	def modelscan(work: str) -> Tuple[List[dict], str]
	```

	Run [ModelScan](https://github.com/protectai/modelscan) (Palo Alto) to detect serialisation RCE in ML model files (`.pkl`, `.pt`, `.h5`, `.onnx`, `.npy`).

	- Requires: `pip install modelscan` (Python <3.13 only)
	- API: Python — `modelscan.scanner.ModelScan().scan(path=work)`
	- Severity: `"CRITICAL"/"HIGH"` → `"ERROR"`, `"MEDIUM"` → `"WARNING"`, else `"INFO"`
	- OWASP: `["A08:2021-Software_and_Data_Integrity_Failures"]`
	- Category: `"security"`

	---

	### `scanners/picklescan_runner.py`

	### `picklescan(work)`

	```python
	def picklescan(work: str) -> Tuple[List[dict], str]
	```

	Run [PickleScan](https://github.com/mmaitre314/picklescan) to detect unsafe opcodes in pickle and PyTorch model files.

	- Requires: `pip install picklescan`
	- API: Python — `picklescan.scanner.scan_file_path(work)`
	- Rule IDs: `PickleUnsafeOpcodesScanner`, `PickleSafetyScanner`
	- OWASP: `["A08:2021-Software_and_Data_Integrity_Failures"]`
	- Category: `"security"`

	---

	### `scanners/fickling_runner.py`

	### `fickling(work)`

	```python
	def fickling(work: str) -> Tuple[List[dict], str]
	```

	Run [Fickling](https://github.com/trailofbits/fickling) (Trail of Bits) allowlist scanner on all pickle-format files.

	- Requires: `pip install fickling`
	- API: Python — `fickling.analysis.run_checks(pkl)`
	- Rule IDs: `FICKLING-DANGEROUS-IMPORTS`, `FICKLING-ARBITRARY-CODE`
	- OWASP: `["A08:2021-Software_and_Data_Integrity_Failures"]`

	---

	### `scanners/trivy_runner.py`

	### `trivy(work)`

	```python
	def trivy(work: str) -> Tuple[List[dict], str]
	```

	Run [Trivy](https://github.com/aquasecurity/trivy) (Aqua Security) for CVE, secret, and IaC scanning.

	- Requires: `trivy` binary on PATH (auto-downloaded by `core.bootstrap`)
	- Tool: `trivy fs <work> --format json --output <tmpfile> --quiet`
	- Timeout: 600 s
	- Output: reads JSON from tempfile (avoids mixing with tool output on stdout)
	- Severity: Trivy's `Severity` field mapped to scanner convention
	- OWASP: `["A06:2021-Vulnerable_and_Outdated_Components"]`

	---

	### `scanners/trufflehog_runner.py`

	### `trufflehog(work, deep_history=False)`

	```python
	def trufflehog(work: str, deep_history: bool = False) -> Tuple[List[dict], str]
	```

	Run [TruffleHog](https://github.com/trufflesecurity/trufflehog) for verified secret detection in git history and filesystem.

	- Requires: `trufflehog` binary on PATH (auto-downloaded by `core.bootstrap`)
	- Tool: `trufflehog filesystem <work> --json --only-verified`
	- Output: JSONL (one JSON object per line)
	- Helper: `_parse_cvss_base_score()` from CVSSv3 string
	- OWASP: `["A02:2021-Cryptographic_Failures"]`
	- Category: `"security"`

	---

	### `scanners/osv_runner.py`

	### `osv_scanner(work)`

	```python
	def osv_scanner(work: str) -> Tuple[List[dict], str]
	```

	Run [OSV-Scanner](https://github.com/google/osv-scanner) (Google) for dependency CVE scanning.

	- Requires: `osv-scanner` binary on PATH
	- Tool: `osv-scanner scan dir:<work> --format json`
	- OWASP: `["A06:2021-Vulnerable_and_Outdated_Components"]`

	---

	### `scanners/checkov_runner.py`

	### `checkov(work)`

	```python
	def checkov(work: str) -> Tuple[List[dict], str]
	```

	Run [Checkov](https://www.checkov.io/) (Bridgecrew) for IaC misconfigurations, GitHub Actions, Terraform.

	- Requires: `pip install checkov`
	- Tool: `checkov -d <work> --output json --compact --quiet`
	- Output format: handles both list-of-result-blocks and single result dict
	- `_CRITICAL_IDS`: frozenset of known high-risk Checkov rule IDs that escalate to `"ERROR"`
	- OWASP: `["A05:2021-Security_Misconfiguration"]`

	---

	### `scanners/grype_runner.py`

	### `grype(work)`

	```python
	def grype(work: str) -> Tuple[List[dict], str]
	```

	Run [Grype](https://github.com/anchore/grype) (Anchore) for container/package CVE scanning via a syft→grype pipeline.

	- Requires: `syft` and `grype` binaries on PATH (both auto-downloaded by `core.bootstrap`)
	- Tool: `syft <work> -o json \| grype --add-cpes-if-none -o json --file <tmpfile>`
	- Output: reads from tempfile (syft pipes stdout to grype, grype writes to file)
	- Severity: Grype's capitalized severity string → scanner convention
	- OWASP: `["A06:2021-Vulnerable_and_Outdated_Components"]`

	---

	### `scanners/socket_runner.py`

	### `socket_scanner(work)`

	```python
	def socket_scanner(work: str) -> Tuple[List[dict], str]
	```

	Run [Socket](https://socket.dev/) CLI for supply-chain security (malicious packages, typosquatting).

	- Requires: `socket` binary on PATH (`npm install -g @socketsecurity/cli`)
	- Tool: `socket scan <work> --json`
	- Category: `"supply-chain"`
	- OWASP: `["A08:2021-Software_and_Data_Integrity_Failures"]`

	---

	### `scanners/safety_runner.py`

	### `safety_check(work)`

	```python
	def safety_check(work: str) -> Tuple[List[dict], str]
	```

	Run [Safety](https://github.com/pyupio/safety) for Python dependency CVE checking.

	- Requires: `pip install safety`
	- Tool: `safety check --json`
	- Output format: handles Safety v2 list format and v3 dict format
	- OWASP: `["A06:2021-Vulnerable_and_Outdated_Components"]`

	---

	### `scanners/llmguard_runner.py`

	### `llm_guard(work)`

	```python
	def llm_guard(work: str) -> Tuple[List[dict], str]
	```

	Run [LLM Guard](https://github.com/protectai/llm-guard) PromptInjection scanner on Python files.

	- Requires: `pip install llm-guard` (Python <3.13 only)
	- API: Python — extracts prompt strings via regex, scans with `llm_guard.input_scanners.PromptInjection`
	- Rule ID: `LLM-PROMPT-INJECTION`
	- OWASP: `["LLM01:2025-Prompt_Injection"]`
	- Category: `"llm"`

	---

	### `scanners/garak_runner.py`

	### `garak(target_url, probes=None)`

	```python
	def garak(target_url: str, probes: list[str] \| None = None) -> Tuple[List[dict], str]
	```

	Run [Garak](https://github.com/NVIDIA/garak) LLM vulnerability probes against a live endpoint.

	- Requires: `pip install garak` ; endpoint URL via `GARAK_TARGET_URL` env var
	- Tool: `garak --model_type rest --model_name <url> --probes <probes> --report_prefix <tmpdir>`
	- Output: parses `.report.jsonl` files from garak's output directory
	- Default probes: `"dan,knownbadsignatures,packagehallucination"`
	- Category: `"llm"`; OWASP: `["LLM01:2025-Prompt_Injection"]`

	---

	### `scanners/deepteam_runner.py`

	### `deepteam(target_url, target_purpose="")`

	```python
	def deepteam(target_url: str, target_purpose: str = "") -> Tuple[List[dict], str]
	```

	Run [DeepTeam](https://github.com/confident-ai/deepteam) (Confident AI) red-team evaluation.

	- Requires: `pip install deepeval` ; endpoint via `DEEPTEAM_TARGET_URL`
	- Approach: writes a YAML config and runs `deepteam run --config <yaml>`
	- Category: `"llm"` ; OWASP: `["LLM02:2025-Sensitive_Information_Disclosure"]`

	---

	### `scanners/promptfoo_runner.py`

	### `promptfoo(target_url, target_purpose="", plugins=None)`

	```python
	def promptfoo(target_url: str, target_purpose: str = "", plugins: list[str] \| None = None) -> Tuple[List[dict], str]
	```

	Run [Promptfoo](https://github.com/promptfoo/promptfoo) red-team evaluation via npx.

	- Requires: Node.js + `npx promptfoo` ; endpoint via `PROMPTFOO_TARGET_URL`
	- Tool: `npx promptfoo@latest redteam run --config <yaml> --output <json>`
	- Default plugins: `["harmful", "injection", "jailbreak", "pii"]`
	- Category: `"llm"`

	---

	### `scanners/azure_redteam_runner.py`

	### `azure_redteam(target_url, risk_categories=None, attack_strategies=None)`

	```python
	def azure_redteam(
	target_url: str,
	risk_categories: list[str] \| None = None,
	attack_strategies: list[str] \| None = None,
	) -> Tuple[List[dict], str]
	```

	Run [Azure AI Evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-red-teaming) red-team pipeline.

	- Requires: `pip install azure-ai-evaluation` ; endpoint via `AZURE_REDTEAM_TARGET_URL`
	- API: `azure.ai.evaluation.red_team.RedTeamOrchestrator` in local mode via `asyncio.run()`
	- Default risk categories: `["violence", "sexual", "hate_unfairness", "self_harm"]`
	- Category: `"llm"`

	---

	### `scanners/pyrit_runner.py`

	### `pyrit(target_url, objectives=None)`

	```python
	def pyrit(target_url: str, objectives: list[str] \| None = None) -> Tuple[List[dict], str]
	```

	Run [PyRIT](https://github.com/Azure/PyRIT) (Microsoft) crescendo multi-turn attack orchestration.

	- Requires: `pip install pyrit` ; endpoint via `PYRIT_TARGET_URL`
	- Binary guard: `have_binary("pyrit_scan")` (optional CLI wrapper)
	- Category: `"llm"` ; OWASP: `["LLM01:2025-Prompt_Injection"]`

	---

	### `scanners/augustus_runner.py`

	### `augustus(target_url, probes=None)`

	```python
	def augustus(target_url: str, probes: list[str] \| None = None) -> Tuple[List[dict], str]
	```

	Run [Augustus](https://github.com/mozilla/augustus) Go-based LLM red-team scanner.

	- Requires: `augustus` binary on PATH ; endpoint via `AUGUSTUS_TARGET_URL`
	- Tool: `augustus --target <url> --probes <probes> --output json`
	- Category: `"llm"`

	---

	### `scanners/fuzzyai_runner.py`

	### `fuzzyai(target_url, attacks=None)`

	```python
	def fuzzyai(target_url: str, attacks: list[str] \| None = None) -> Tuple[List[dict], str]
	```

	Run [FuzzyAI](https://github.com/cyberark/FuzzyAI) adversarial fuzzing framework.

	- Requires: `pip install fuzzy-ai` ; endpoint via `FUZZYAI_TARGET_URL`
	- API: Python — `import fuzzy_ai`
	- Category: `"llm"`

	---

	### `scanners/giskard_runner.py`

	### `giskard_scan(target_url, model_description="")`

	```python
	def giskard_scan(target_url: str, model_description: str = "") -> Tuple[List[dict], str]
	```

	Run [Giskard](https://github.com/Giskard-AI/giskard) model testing and red-teaming.

	- Requires: `pip install giskard` ; endpoint via `GISKARD_TARGET_URL`
	- API: Python — wraps endpoint in a REST model, calls `giskard.scan(model)`
	- Category: `"llm"`

	---

	### `scanners/vigil_runner.py`

	### `vigil(work)`

	```python
	def vigil(work: str) -> Tuple[List[dict], str]
	```

	Run [Vigil](https://github.com/deadbits/vigil-llm) YARA-based prompt injection / jailbreak scanner on Python files.

	- Requires: `pip install vigil-llm` (Python <3.13 only)
	- API: Python — `Vigil.from_config_dict(…)`, scans prompt strings extracted by regex
	- Rule ID: `LLM-PROMPT-INJECTION-VIGIL`
	- Category: `"llm"` ; OWASP: `["LLM01:2025-Prompt_Injection"]`

	---

	### `scanners/nemo_runner.py`

	### `nemo_guardrails(work)`

	```python
	def nemo_guardrails(work: str) -> Tuple[List[dict], str]
	```

	Static analysis of [NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) `.co` config files.

	- Requires: `pip install nemoguardrails` (Python <3.13 only)
	- Pattern: globs `*/.co` and checks for missing `input rails` / `output rails` definitions
	- Rule IDs: `NEMO-MISSING-INPUT-RAIL`, `NEMO-MISSING-OUTPUT-RAIL`
	- Category: `"llm"` ; OWASP: `["LLM01:2025-Prompt_Injection"]`

	---

	### `scanners/guardrailsai_runner.py`

	### `guardrails_ai(work)`

	```python
	def guardrails_ai(work: str) -> Tuple[List[dict], str]
	```

	Static analysis of [Guardrails AI](https://github.com/guardrails-ai/guardrails) configuration in Python source files.

	- Requires: `pip install guardrails-ai` (Python <3.13 only)
	- Patterns checked:
	- `GUARD-ON-FAIL-NOOP` — `on_fail="noop"` disables error handling
	- `GUARD-THRESHOLD-TOO-HIGH` — threshold ≥ 0.95 causes near-never-trigger guards
	- `GUARD-MISSING-TOXICITY` — no toxicity/content guard applied
	- Category: `"llm"` ; OWASP: `["LLM01:2025-Prompt_Injection"]`