| # API Reference β `rules/` | |
| ## `rules/__init__.py` | |
| Semgrep rule pack registry. All constants are `Path` objects pointing to YAML files in the project root. The `ALL_*` lists are consumed by `core/scanner.py`'s `scan_repo()` to build the parallel task list. | |
| --- | |
| ## Individual path constants | |
| ```python | |
| from rules import CORE, WEB, CRYPTO, ML, SECRETS, PERF, LLM | |
| ``` | |
| | Constant | File | Description | | |
| |----------|------|-------------| | |
| | `CORE` | `core.yaml` | Core Python security β subprocess injection, eval, pickle deserialization, unsafe YAML loading | | |
| | `WEB` | `web.yaml` | Web security β XSS, SSRF, open redirect, path traversal | | |
| | `CRYPTO` | `crypto.yaml` | Cryptographic failures β weak ciphers, hardcoded keys, insecure RNG | | |
| | `ML` | `ml.yaml` | ML-specific β unsafe `pickle.load`, `torch.load` without `weights_only`, model-path injection | | |
| | `SECRETS` | `secrets.yaml` | Secret patterns β API keys, tokens, credentials in code | | |
| | `PERF` | `perf.yaml` | Performance anti-patterns β list building in loops, `try/except` in loops | | |
| | `LLM` | `llm.yaml` | LLM/agent security β prompt injection (LLM01), insecure output handling (LLM02), PII in prompts (LLM06) | | |
| --- | |
| ## Aggregated list constants | |
| ### `ALL_SECURITY` | |
| ```python | |
| ALL_SECURITY: List[Tuple[str, Path, str]] = [ | |
| ("Semgrep:Core", CORE, "security"), | |
| ("Semgrep:Web", WEB, "security"), | |
| ("Semgrep:Crypto", CRYPTO, "security"), | |
| ("Semgrep:ML", ML, "security"), | |
| ("Semgrep:Secrets", SECRETS, "security"), | |
| ] | |
| ``` | |
| Iterated in `scan_repo()` when `run_security=True`. Each tuple `(label, path, category)` produces one `semgrep_pack()` call per entry. | |
| ### `ALL_PERFORMANCE` | |
| ```python | |
| ALL_PERFORMANCE: List[Tuple[str, Path, str]] = [ | |
| ("Semgrep:Perf", PERF, "performance"), | |
| ] | |
| ``` | |
| Iterated when `run_performance=True`. | |
| ### `ALL_LLM` | |
| ```python | |
| ALL_LLM: List[Tuple[str, Path, str]] = [ | |
| ("Semgrep:LLM", LLM, "security"), | |
| ] | |
| ``` | |
| Iterated when `run_llm=True`. | |
| --- | |
| ## Semgrep YAML rule format | |
| Each `.yaml` file follows the [Semgrep rule schema](https://semgrep.dev/docs/writing-rules/rule-syntax/). The `metadata` block controls how findings are categorized: | |
| ```yaml | |
| rules: | |
| - id: my-rule-id | |
| patterns: | |
| - pattern: | | |
| dangerous_call($X, ...) | |
| message: | | |
| Dangerous call detected. $X may be user-controlled. | |
| severity: ERROR # ERROR | WARNING | INFO | |
| languages: [python] | |
| metadata: | |
| owasp: | |
| - A03:2021-Injection | |
| confidence: confirmed # confirmed | likely | possible | |
| category: security | |
| ``` | |
| **`metadata` fields used by autoscan:** | |
| | Field | Usage | | |
| |-------|-------| | |
| | `owasp` | Stored in `finding["owasp"]`; drives SARIF help URIs and HTML badges | | |
| | `confidence` | Stored in `finding["confidence"]` | | |
| | `category` | Used as `finding["category"]` (`"security"` or `"performance"`) | | |
| --- | |
| ## Adding a new rule pack | |
| 1. Create `myrules.yaml` in the project root. | |
| 2. Add a constant and list entry in `rules/__init__.py`: | |
| ```python | |
| MYRULES = _ROOT / "myrules.yaml" | |
| ALL_SECURITY = [ | |
| ... | |
| ("Semgrep:MyRules", MYRULES, "security"), | |
| ] | |
| ``` | |
| `scan_repo()` automatically picks it up β no changes to `core/scanner.py` needed. | |
| See [How to Extend](../how-to-extend.md#adding-a-new-semgrep-rule-pack) for the full walkthrough. | |