autoscan / docs /api /scanners.md
Chris4K's picture
Upload 384 files
a2a5bfd verified
# API Reference — `scanners/`
## `scanners/__init__.py` — public exports
```python
from scanners import (
agent_audit,
bandit,
detect_secrets,
forbidden_files,
gitleaks,
hadolint,
pip_audit,
ruff_perf,
semgrep_pack,
)
```
---
## Common return types
Most runners return a 2-tuple:
```python
Tuple[List[dict], str] # (findings, log_message)
```
`semgrep_pack` is the exception — it returns `List[dict]` directly (no log message) because it is called once per rule pack and the caller aggregates messages.
All finding dicts conform to the schema defined in [`core/models.py`](core.md#coremodespy).
---
## `scanners/bandit_runner.py`
### `bandit(work)`
```python
def bandit(work: str) -> Tuple[List[dict], str]
```
Run [Bandit](https://bandit.readthedocs.io/) Python security linter against the work directory.
- **Tool**: `bandit -r <work> -f json -q --exclude .git,.venv,...`
- **Timeout**: 300 s
- **Confidence mapping**: `HIGH`/`MEDIUM` severity → `"likely"`, otherwise → `"possible"`.
- **OWASP**: `["UNMAPPED"]` (Bandit rule IDs map across multiple categories; prefer reviewing `rule`).
- **Skips**: `.git`, `.venv`, `venv`, `env`, `node_modules`, `__pycache__`.
---
## `scanners/semgrep_runner.py`
### `semgrep_pack(rules_path, work, tool_label, category)`
```python
def semgrep_pack(
rules_path: Path,
work: str,
tool_label: str,
category: str = "security",
) -> List[dict]
```
Run Semgrep with a specific YAML rule pack. Returns findings list directly (no log message).
- **Tool**: `semgrep --config <rules_path> --json --quiet --metrics=off --no-git-ignore <work>`
- **Timeout**: 600 s
- **`tool_label`**: Used as the `tool` field in finding dicts (e.g., `"Semgrep:Core"`).
- **Metadata extracted**: `owasp`, `confidence` from Semgrep result `extra.metadata`.
### `detect_secrets(work)`
```python
def detect_secrets(work: str) -> Tuple[List[dict], str]
```
Run [detect-secrets](https://github.com/Yelp/detect-secrets) against all files in `work`.
- **Tool**: `detect-secrets scan --all-files <work>`
- **Timeout**: 300 s
- **Severity**: always `"ERROR"`.
- **Confidence**: `"possible"` for entropy/high-entropy types, otherwise `"likely"`.
- **OWASP**: `["A02:2021-Cryptographic_Failures"]`.
- **Note**: Requires a git-initialized directory for best results.
---
## `scanners/ruff_runner.py`
### `ruff_perf(work)`
```python
def ruff_perf(work: str) -> Tuple[List[dict], str]
```
Run [Ruff](https://docs.astral.sh/ruff/) with PERF rule selection.
- **Tool**: `ruff check --select PERF --output-format json <work>`
- **Timeout**: 120 s
- **Severity**: always `"WARNING"` (performance suggestions are non-critical).
- **Confidence**: `"likely"` (Ruff PERF rules have very few false positives).
- **OWASP**: `["UNMAPPED"]` (performance — not a security concern).
- **Category**: `"performance"`.
---
## `scanners/pip_audit_runner.py`
### `pip_audit(work)`
```python
def pip_audit(work: str) -> Tuple[List[dict], str]
```
Scan all `requirements*.txt` files in `work` for known CVEs using [pip-audit](https://github.com/pypa/pip-audit).
- **Tool**: `pip-audit -r <req_file> -f json --strict --progress-spinner off`
- **Timeout**: 90 s per requirements file
- **Severity**: always `"ERROR"` (any known CVE is critical).
- **Confidence**: `"confirmed"` (CVE database entries are verified).
- **OWASP**: `["A06:2021-Vulnerable_and_Outdated_Components"]`.
- **Remediation**: auto-populated from `fix_versions` field.
- **Skips**: files under `.git/`.
---
## `scanners/gitleaks_runner.py`
### `gitleaks(work)`
```python
def gitleaks(work: str) -> Tuple[List[dict], str]
```
Run [gitleaks](https://github.com/gitleaks/gitleaks) to detect secrets committed to git history.
- **Requires**: `gitleaks` binary on PATH (auto-downloaded by `core.bootstrap`).
- **Tool**: `gitleaks detect --source <work> --report-path <tmp.json> --report-format json --no-banner --exit-code 0`
- **Timeout**: 600 s
- **Severity**: always `"ERROR"`.
- **Confidence**: `"confirmed"`.
- **OWASP**: `["A02:2021-Cryptographic_Failures"]`.
- **Note**: Only added to the task list when `deep_history=True` in `scan_repo()`.
---
## `scanners/hadolint_runner.py`
### `hadolint(work)`
```python
def hadolint(work: str) -> Tuple[List[dict], str]
```
Run [hadolint](https://github.com/hadolint/hadolint) against all `Dockerfile*` files found recursively in `work`.
- **Requires**: `hadolint` binary on PATH (auto-downloaded by `core.bootstrap`).
- **Tool**: `hadolint -f json <Dockerfile>`
- **Timeout**: 60 s per Dockerfile
- **Severity**: from hadolint's `level` field (uppercased).
- **Confidence**: `"likely"` (TOOL_DEFAULT_CONFIDENCE).
- **OWASP**: `["A05:2021-Security_Misconfiguration"]`.
- **Skips**: Dockerfiles under `.git/`.
---
## `scanners/forbidden_files.py`
### `forbidden_files(work)`
```python
def forbidden_files(work: str) -> Tuple[List[dict], str]
```
Walk `work` and flag any file whose **basename** matches the `FORBIDDEN_FILES` list in `core/models.py`.
- **No external tool required** — pure Python.
- **Severity**: `"ERROR"`.
- **Confidence**: `"confirmed"`.
- **OWASP**: `["A02:2021-Cryptographic_Failures"]`.
**Detected file names include:** `.env`, `.env.local`, `.env.production`, `id_rsa`, `id_dsa`, `id_ecdsa`, `id_ed25519`, `.git-credentials`, `.npmrc`, `.pypirc`, `credentials`, `credentials.json`, `service-account.json`, `serviceAccountKey.json`, `wp-config.php`.
---
## `scanners/agent_audit_runner.py`
### `agent_audit(work)`
```python
def agent_audit(work: str) -> Tuple[List[dict], str]
```
Run [agent-audit](https://github.com/dreadnode/agent-audit) for OWASP Agentic Top 10 (2026) scanning of AI agent code.
- **Requires**: `agent-audit` binary on PATH (`pip install agent-audit`).
- **Tool**: `agent-audit scan <work> --format json`
- **Timeout**: 300 s
- **Severity mapping**: `critical → ERROR`, `high → HIGH`, `medium → WARNING`, `low → INFO`.
- **Confidence mapping**: score ≥ 0.9 → `"confirmed"`, ≥ 0.7 → `"likely"`, else → `"possible"`.
- **OWASP**: uses `asi_categories` field from agent-audit output (e.g., `["LLM01"]`).
- **Suppressed findings**: automatically excluded.
#### Internal helpers
##### `_confidence(score)`
```python
def _confidence(score: float) -> str
```
Map agent-audit float confidence (0–1) to `"confirmed"` / `"likely"` / `"possible"`.
##### `_severity(s)`
```python
def _severity(s: str) -> str
```
Normalize agent-audit severity string to scanner convention (`"ERROR"`, `"HIGH"`, `"WARNING"`, `"INFO"`).
---
## New runners — Sprint 6 (Tasks 01–23)
### `scanners/modelscan_runner.py`
### `modelscan(work)`
```python
def modelscan(work: str) -> Tuple[List[dict], str]
```
Run [ModelScan](https://github.com/protectai/modelscan) (Palo Alto) to detect serialisation RCE in ML model files (`.pkl`, `.pt`, `.h5`, `.onnx`, `.npy`).
- **Requires**: `pip install modelscan` (Python <3.13 only)
- **API**: Python — `modelscan.scanner.ModelScan().scan(path=work)`
- **Severity**: `"CRITICAL"/"HIGH"``"ERROR"`, `"MEDIUM"``"WARNING"`, else `"INFO"`
- **OWASP**: `["A08:2021-Software_and_Data_Integrity_Failures"]`
- **Category**: `"security"`
---
### `scanners/picklescan_runner.py`
### `picklescan(work)`
```python
def picklescan(work: str) -> Tuple[List[dict], str]
```
Run [PickleScan](https://github.com/mmaitre314/picklescan) to detect unsafe opcodes in pickle and PyTorch model files.
- **Requires**: `pip install picklescan`
- **API**: Python — `picklescan.scanner.scan_file_path(work)`
- **Rule IDs**: `PickleUnsafeOpcodesScanner`, `PickleSafetyScanner`
- **OWASP**: `["A08:2021-Software_and_Data_Integrity_Failures"]`
- **Category**: `"security"`
---
### `scanners/fickling_runner.py`
### `fickling(work)`
```python
def fickling(work: str) -> Tuple[List[dict], str]
```
Run [Fickling](https://github.com/trailofbits/fickling) (Trail of Bits) allowlist scanner on all pickle-format files.
- **Requires**: `pip install fickling`
- **API**: Python — `fickling.analysis.run_checks(pkl)`
- **Rule IDs**: `FICKLING-DANGEROUS-IMPORTS`, `FICKLING-ARBITRARY-CODE`
- **OWASP**: `["A08:2021-Software_and_Data_Integrity_Failures"]`
---
### `scanners/trivy_runner.py`
### `trivy(work)`
```python
def trivy(work: str) -> Tuple[List[dict], str]
```
Run [Trivy](https://github.com/aquasecurity/trivy) (Aqua Security) for CVE, secret, and IaC scanning.
- **Requires**: `trivy` binary on PATH (auto-downloaded by `core.bootstrap`)
- **Tool**: `trivy fs <work> --format json --output <tmpfile> --quiet`
- **Timeout**: 600 s
- **Output**: reads JSON from tempfile (avoids mixing with tool output on stdout)
- **Severity**: Trivy's `Severity` field mapped to scanner convention
- **OWASP**: `["A06:2021-Vulnerable_and_Outdated_Components"]`
---
### `scanners/trufflehog_runner.py`
### `trufflehog(work, deep_history=False)`
```python
def trufflehog(work: str, deep_history: bool = False) -> Tuple[List[dict], str]
```
Run [TruffleHog](https://github.com/trufflesecurity/trufflehog) for verified secret detection in git history and filesystem.
- **Requires**: `trufflehog` binary on PATH (auto-downloaded by `core.bootstrap`)
- **Tool**: `trufflehog filesystem <work> --json --only-verified`
- **Output**: JSONL (one JSON object per line)
- **Helper**: `_parse_cvss_base_score()` from CVSSv3 string
- **OWASP**: `["A02:2021-Cryptographic_Failures"]`
- **Category**: `"security"`
---
### `scanners/osv_runner.py`
### `osv_scanner(work)`
```python
def osv_scanner(work: str) -> Tuple[List[dict], str]
```
Run [OSV-Scanner](https://github.com/google/osv-scanner) (Google) for dependency CVE scanning.
- **Requires**: `osv-scanner` binary on PATH
- **Tool**: `osv-scanner scan dir:<work> --format json`
- **OWASP**: `["A06:2021-Vulnerable_and_Outdated_Components"]`
---
### `scanners/checkov_runner.py`
### `checkov(work)`
```python
def checkov(work: str) -> Tuple[List[dict], str]
```
Run [Checkov](https://www.checkov.io/) (Bridgecrew) for IaC misconfigurations, GitHub Actions, Terraform.
- **Requires**: `pip install checkov`
- **Tool**: `checkov -d <work> --output json --compact --quiet`
- **Output format**: handles both list-of-result-blocks and single result dict
- **`_CRITICAL_IDS`**: frozenset of known high-risk Checkov rule IDs that escalate to `"ERROR"`
- **OWASP**: `["A05:2021-Security_Misconfiguration"]`
---
### `scanners/grype_runner.py`
### `grype(work)`
```python
def grype(work: str) -> Tuple[List[dict], str]
```
Run [Grype](https://github.com/anchore/grype) (Anchore) for container/package CVE scanning via a syft→grype pipeline.
- **Requires**: `syft` and `grype` binaries on PATH (both auto-downloaded by `core.bootstrap`)
- **Tool**: `syft <work> -o json | grype --add-cpes-if-none -o json --file <tmpfile>`
- **Output**: reads from tempfile (syft pipes stdout to grype, grype writes to file)
- **Severity**: Grype's capitalized severity string → scanner convention
- **OWASP**: `["A06:2021-Vulnerable_and_Outdated_Components"]`
---
### `scanners/socket_runner.py`
### `socket_scanner(work)`
```python
def socket_scanner(work: str) -> Tuple[List[dict], str]
```
Run [Socket](https://socket.dev/) CLI for supply-chain security (malicious packages, typosquatting).
- **Requires**: `socket` binary on PATH (`npm install -g @socketsecurity/cli`)
- **Tool**: `socket scan <work> --json`
- **Category**: `"supply-chain"`
- **OWASP**: `["A08:2021-Software_and_Data_Integrity_Failures"]`
---
### `scanners/safety_runner.py`
### `safety_check(work)`
```python
def safety_check(work: str) -> Tuple[List[dict], str]
```
Run [Safety](https://github.com/pyupio/safety) for Python dependency CVE checking.
- **Requires**: `pip install safety`
- **Tool**: `safety check --json`
- **Output format**: handles Safety v2 list format and v3 dict format
- **OWASP**: `["A06:2021-Vulnerable_and_Outdated_Components"]`
---
### `scanners/llmguard_runner.py`
### `llm_guard(work)`
```python
def llm_guard(work: str) -> Tuple[List[dict], str]
```
Run [LLM Guard](https://github.com/protectai/llm-guard) PromptInjection scanner on Python files.
- **Requires**: `pip install llm-guard` (Python <3.13 only)
- **API**: Python — extracts prompt strings via regex, scans with `llm_guard.input_scanners.PromptInjection`
- **Rule ID**: `LLM-PROMPT-INJECTION`
- **OWASP**: `["LLM01:2025-Prompt_Injection"]`
- **Category**: `"llm"`
---
### `scanners/garak_runner.py`
### `garak(target_url, probes=None)`
```python
def garak(target_url: str, probes: list[str] | None = None) -> Tuple[List[dict], str]
```
Run [Garak](https://github.com/NVIDIA/garak) LLM vulnerability probes against a live endpoint.
- **Requires**: `pip install garak` ; endpoint URL via `GARAK_TARGET_URL` env var
- **Tool**: `garak --model_type rest --model_name <url> --probes <probes> --report_prefix <tmpdir>`
- **Output**: parses `.report.jsonl` files from garak's output directory
- **Default probes**: `"dan,knownbadsignatures,packagehallucination"`
- **Category**: `"llm"`; **OWASP**: `["LLM01:2025-Prompt_Injection"]`
---
### `scanners/deepteam_runner.py`
### `deepteam(target_url, target_purpose="")`
```python
def deepteam(target_url: str, target_purpose: str = "") -> Tuple[List[dict], str]
```
Run [DeepTeam](https://github.com/confident-ai/deepteam) (Confident AI) red-team evaluation.
- **Requires**: `pip install deepeval` ; endpoint via `DEEPTEAM_TARGET_URL`
- **Approach**: writes a YAML config and runs `deepteam run --config <yaml>`
- **Category**: `"llm"` ; **OWASP**: `["LLM02:2025-Sensitive_Information_Disclosure"]`
---
### `scanners/promptfoo_runner.py`
### `promptfoo(target_url, target_purpose="", plugins=None)`
```python
def promptfoo(target_url: str, target_purpose: str = "", plugins: list[str] | None = None) -> Tuple[List[dict], str]
```
Run [Promptfoo](https://github.com/promptfoo/promptfoo) red-team evaluation via npx.
- **Requires**: Node.js + `npx promptfoo` ; endpoint via `PROMPTFOO_TARGET_URL`
- **Tool**: `npx promptfoo@latest redteam run --config <yaml> --output <json>`
- **Default plugins**: `["harmful", "injection", "jailbreak", "pii"]`
- **Category**: `"llm"`
---
### `scanners/azure_redteam_runner.py`
### `azure_redteam(target_url, risk_categories=None, attack_strategies=None)`
```python
def azure_redteam(
target_url: str,
risk_categories: list[str] | None = None,
attack_strategies: list[str] | None = None,
) -> Tuple[List[dict], str]
```
Run [Azure AI Evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-red-teaming) red-team pipeline.
- **Requires**: `pip install azure-ai-evaluation` ; endpoint via `AZURE_REDTEAM_TARGET_URL`
- **API**: `azure.ai.evaluation.red_team.RedTeamOrchestrator` in local mode via `asyncio.run()`
- **Default risk categories**: `["violence", "sexual", "hate_unfairness", "self_harm"]`
- **Category**: `"llm"`
---
### `scanners/pyrit_runner.py`
### `pyrit(target_url, objectives=None)`
```python
def pyrit(target_url: str, objectives: list[str] | None = None) -> Tuple[List[dict], str]
```
Run [PyRIT](https://github.com/Azure/PyRIT) (Microsoft) crescendo multi-turn attack orchestration.
- **Requires**: `pip install pyrit` ; endpoint via `PYRIT_TARGET_URL`
- **Binary guard**: `have_binary("pyrit_scan")` (optional CLI wrapper)
- **Category**: `"llm"` ; **OWASP**: `["LLM01:2025-Prompt_Injection"]`
---
### `scanners/augustus_runner.py`
### `augustus(target_url, probes=None)`
```python
def augustus(target_url: str, probes: list[str] | None = None) -> Tuple[List[dict], str]
```
Run [Augustus](https://github.com/mozilla/augustus) Go-based LLM red-team scanner.
- **Requires**: `augustus` binary on PATH ; endpoint via `AUGUSTUS_TARGET_URL`
- **Tool**: `augustus --target <url> --probes <probes> --output json`
- **Category**: `"llm"`
---
### `scanners/fuzzyai_runner.py`
### `fuzzyai(target_url, attacks=None)`
```python
def fuzzyai(target_url: str, attacks: list[str] | None = None) -> Tuple[List[dict], str]
```
Run [FuzzyAI](https://github.com/cyberark/FuzzyAI) adversarial fuzzing framework.
- **Requires**: `pip install fuzzy-ai` ; endpoint via `FUZZYAI_TARGET_URL`
- **API**: Python — `import fuzzy_ai`
- **Category**: `"llm"`
---
### `scanners/giskard_runner.py`
### `giskard_scan(target_url, model_description="")`
```python
def giskard_scan(target_url: str, model_description: str = "") -> Tuple[List[dict], str]
```
Run [Giskard](https://github.com/Giskard-AI/giskard) model testing and red-teaming.
- **Requires**: `pip install giskard` ; endpoint via `GISKARD_TARGET_URL`
- **API**: Python — wraps endpoint in a REST model, calls `giskard.scan(model)`
- **Category**: `"llm"`
---
### `scanners/vigil_runner.py`
### `vigil(work)`
```python
def vigil(work: str) -> Tuple[List[dict], str]
```
Run [Vigil](https://github.com/deadbits/vigil-llm) YARA-based prompt injection / jailbreak scanner on Python files.
- **Requires**: `pip install vigil-llm` (Python <3.13 only)
- **API**: Python — `Vigil.from_config_dict(…)`, scans prompt strings extracted by regex
- **Rule ID**: `LLM-PROMPT-INJECTION-VIGIL`
- **Category**: `"llm"` ; **OWASP**: `["LLM01:2025-Prompt_Injection"]`
---
### `scanners/nemo_runner.py`
### `nemo_guardrails(work)`
```python
def nemo_guardrails(work: str) -> Tuple[List[dict], str]
```
Static analysis of [NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) `.co` config files.
- **Requires**: `pip install nemoguardrails` (Python <3.13 only)
- **Pattern**: globs `**/*.co` and checks for missing `input rails` / `output rails` definitions
- **Rule IDs**: `NEMO-MISSING-INPUT-RAIL`, `NEMO-MISSING-OUTPUT-RAIL`
- **Category**: `"llm"` ; **OWASP**: `["LLM01:2025-Prompt_Injection"]`
---
### `scanners/guardrailsai_runner.py`
### `guardrails_ai(work)`
```python
def guardrails_ai(work: str) -> Tuple[List[dict], str]
```
Static analysis of [Guardrails AI](https://github.com/guardrails-ai/guardrails) configuration in Python source files.
- **Requires**: `pip install guardrails-ai` (Python <3.13 only)
- **Patterns checked**:
- `GUARD-ON-FAIL-NOOP``on_fail="noop"` disables error handling
- `GUARD-THRESHOLD-TOO-HIGH` — threshold ≥ 0.95 causes near-never-trigger guards
- `GUARD-MISSING-TOXICITY` — no toxicity/content guard applied
- **Category**: `"llm"` ; **OWASP**: `["LLM01:2025-Prompt_Injection"]`