# API Reference — `rules/` ## `rules/__init__.py` Semgrep rule pack registry. All constants are `Path` objects pointing to YAML files in the project root. The `ALL_*` lists are consumed by `core/scanner.py`'s `scan_repo()` to build the parallel task list. --- ## Individual path constants ```python from rules import CORE, WEB, CRYPTO, ML, SECRETS, PERF, LLM ``` | Constant | File | Description | |----------|------|-------------| | `CORE` | `core.yaml` | Core Python security — subprocess injection, eval, pickle deserialization, unsafe YAML loading | | `WEB` | `web.yaml` | Web security — XSS, SSRF, open redirect, path traversal | | `CRYPTO` | `crypto.yaml` | Cryptographic failures — weak ciphers, hardcoded keys, insecure RNG | | `ML` | `ml.yaml` | ML-specific — unsafe `pickle.load`, `torch.load` without `weights_only`, model-path injection | | `SECRETS` | `secrets.yaml` | Secret patterns — API keys, tokens, credentials in code | | `PERF` | `perf.yaml` | Performance anti-patterns — list building in loops, `try/except` in loops | | `LLM` | `llm.yaml` | LLM/agent security — prompt injection (LLM01), insecure output handling (LLM02), PII in prompts (LLM06) | --- ## Aggregated list constants ### `ALL_SECURITY` ```python ALL_SECURITY: List[Tuple[str, Path, str]] = [ ("Semgrep:Core", CORE, "security"), ("Semgrep:Web", WEB, "security"), ("Semgrep:Crypto", CRYPTO, "security"), ("Semgrep:ML", ML, "security"), ("Semgrep:Secrets", SECRETS, "security"), ] ``` Iterated in `scan_repo()` when `run_security=True`. Each tuple `(label, path, category)` produces one `semgrep_pack()` call per entry. ### `ALL_PERFORMANCE` ```python ALL_PERFORMANCE: List[Tuple[str, Path, str]] = [ ("Semgrep:Perf", PERF, "performance"), ] ``` Iterated when `run_performance=True`. ### `ALL_LLM` ```python ALL_LLM: List[Tuple[str, Path, str]] = [ ("Semgrep:LLM", LLM, "security"), ] ``` Iterated when `run_llm=True`. --- ## Semgrep YAML rule format Each `.yaml` file follows the [Semgrep rule schema](https://semgrep.dev/docs/writing-rules/rule-syntax/). The `metadata` block controls how findings are categorized: ```yaml rules: - id: my-rule-id patterns: - pattern: | dangerous_call($X, ...) message: | Dangerous call detected. $X may be user-controlled. severity: ERROR # ERROR | WARNING | INFO languages: [python] metadata: owasp: - A03:2021-Injection confidence: confirmed # confirmed | likely | possible category: security ``` **`metadata` fields used by autoscan:** | Field | Usage | |-------|-------| | `owasp` | Stored in `finding["owasp"]`; drives SARIF help URIs and HTML badges | | `confidence` | Stored in `finding["confidence"]` | | `category` | Used as `finding["category"]` (`"security"` or `"performance"`) | --- ## Adding a new rule pack 1. Create `myrules.yaml` in the project root. 2. Add a constant and list entry in `rules/__init__.py`: ```python MYRULES = _ROOT / "myrules.yaml" ALL_SECURITY = [ ... ("Semgrep:MyRules", MYRULES, "security"), ] ``` `scan_repo()` automatically picks it up — no changes to `core/scanner.py` needed. See [How to Extend](../how-to-extend.md#adding-a-new-semgrep-rule-pack) for the full walkthrough.