File size: 3,439 Bytes
5248e3b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | # API Reference β `rules/`
## `rules/__init__.py`
Semgrep rule pack registry. All constants are `Path` objects pointing to YAML files in the project root. The `ALL_*` lists are consumed by `core/scanner.py`'s `scan_repo()` to build the parallel task list.
---
## Individual path constants
```python
from rules import CORE, WEB, CRYPTO, ML, SECRETS, PERF, LLM
```
| Constant | File | Description |
|----------|------|-------------|
| `CORE` | `core.yaml` | Core Python security β subprocess injection, eval, pickle deserialization, unsafe YAML loading |
| `WEB` | `web.yaml` | Web security β XSS, SSRF, open redirect, path traversal |
| `CRYPTO` | `crypto.yaml` | Cryptographic failures β weak ciphers, hardcoded keys, insecure RNG |
| `ML` | `ml.yaml` | ML-specific β unsafe `pickle.load`, `torch.load` without `weights_only`, model-path injection |
| `SECRETS` | `secrets.yaml` | Secret patterns β API keys, tokens, credentials in code |
| `PERF` | `perf.yaml` | Performance anti-patterns β list building in loops, `try/except` in loops |
| `LLM` | `llm.yaml` | LLM/agent security β prompt injection (LLM01), insecure output handling (LLM02), PII in prompts (LLM06) |
---
## Aggregated list constants
### `ALL_SECURITY`
```python
ALL_SECURITY: List[Tuple[str, Path, str]] = [
("Semgrep:Core", CORE, "security"),
("Semgrep:Web", WEB, "security"),
("Semgrep:Crypto", CRYPTO, "security"),
("Semgrep:ML", ML, "security"),
("Semgrep:Secrets", SECRETS, "security"),
]
```
Iterated in `scan_repo()` when `run_security=True`. Each tuple `(label, path, category)` produces one `semgrep_pack()` call per entry.
### `ALL_PERFORMANCE`
```python
ALL_PERFORMANCE: List[Tuple[str, Path, str]] = [
("Semgrep:Perf", PERF, "performance"),
]
```
Iterated when `run_performance=True`.
### `ALL_LLM`
```python
ALL_LLM: List[Tuple[str, Path, str]] = [
("Semgrep:LLM", LLM, "security"),
]
```
Iterated when `run_llm=True`.
---
## Semgrep YAML rule format
Each `.yaml` file follows the [Semgrep rule schema](https://semgrep.dev/docs/writing-rules/rule-syntax/). The `metadata` block controls how findings are categorized:
```yaml
rules:
- id: my-rule-id
patterns:
- pattern: |
dangerous_call($X, ...)
message: |
Dangerous call detected. $X may be user-controlled.
severity: ERROR # ERROR | WARNING | INFO
languages: [python]
metadata:
owasp:
- A03:2021-Injection
confidence: confirmed # confirmed | likely | possible
category: security
```
**`metadata` fields used by autoscan:**
| Field | Usage |
|-------|-------|
| `owasp` | Stored in `finding["owasp"]`; drives SARIF help URIs and HTML badges |
| `confidence` | Stored in `finding["confidence"]` |
| `category` | Used as `finding["category"]` (`"security"` or `"performance"`) |
---
## Adding a new rule pack
1. Create `myrules.yaml` in the project root.
2. Add a constant and list entry in `rules/__init__.py`:
```python
MYRULES = _ROOT / "myrules.yaml"
ALL_SECURITY = [
...
("Semgrep:MyRules", MYRULES, "security"),
]
```
`scan_repo()` automatically picks it up β no changes to `core/scanner.py` needed.
See [How to Extend](../how-to-extend.md#adding-a-new-semgrep-rule-pack) for the full walkthrough.
|