File size: 3,439 Bytes
5248e3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# API Reference β€” `rules/`

## `rules/__init__.py`

Semgrep rule pack registry. All constants are `Path` objects pointing to YAML files in the project root. The `ALL_*` lists are consumed by `core/scanner.py`'s `scan_repo()` to build the parallel task list.

---

## Individual path constants

```python

from rules import CORE, WEB, CRYPTO, ML, SECRETS, PERF, LLM

```

| Constant | File | Description |
|----------|------|-------------|
| `CORE` | `core.yaml` | Core Python security β€” subprocess injection, eval, pickle deserialization, unsafe YAML loading |
| `WEB` | `web.yaml` | Web security β€” XSS, SSRF, open redirect, path traversal |
| `CRYPTO` | `crypto.yaml` | Cryptographic failures β€” weak ciphers, hardcoded keys, insecure RNG |
| `ML` | `ml.yaml` | ML-specific β€” unsafe `pickle.load`, `torch.load` without `weights_only`, model-path injection |
| `SECRETS` | `secrets.yaml` | Secret patterns β€” API keys, tokens, credentials in code |
| `PERF` | `perf.yaml` | Performance anti-patterns β€” list building in loops, `try/except` in loops |
| `LLM` | `llm.yaml` | LLM/agent security β€” prompt injection (LLM01), insecure output handling (LLM02), PII in prompts (LLM06) |

---

## Aggregated list constants

### `ALL_SECURITY`



```python

ALL_SECURITY: List[Tuple[str, Path, str]] = [
    ("Semgrep:Core",    CORE,    "security"),

    ("Semgrep:Web",     WEB,     "security"),

    ("Semgrep:Crypto",  CRYPTO,  "security"),

    ("Semgrep:ML",      ML,      "security"),

    ("Semgrep:Secrets", SECRETS, "security"),

]

```


Iterated in `scan_repo()` when `run_security=True`. Each tuple `(label, path, category)` produces one `semgrep_pack()` call per entry.

### `ALL_PERFORMANCE`



```python

ALL_PERFORMANCE: List[Tuple[str, Path, str]] = [
    ("Semgrep:Perf", PERF, "performance"),

]

```


Iterated when `run_performance=True`.

### `ALL_LLM`



```python

ALL_LLM: List[Tuple[str, Path, str]] = [
    ("Semgrep:LLM", LLM, "security"),

]

```


Iterated when `run_llm=True`.

---

## Semgrep YAML rule format

Each `.yaml` file follows the [Semgrep rule schema](https://semgrep.dev/docs/writing-rules/rule-syntax/). The `metadata` block controls how findings are categorized:

```yaml

rules:

  - id: my-rule-id

    patterns:

      - pattern: |

          dangerous_call($X, ...)

    message: |

      Dangerous call detected. $X may be user-controlled.

    severity: ERROR           # ERROR | WARNING | INFO

    languages: [python]

    metadata:

      owasp:

        - A03:2021-Injection

      confidence: confirmed   # confirmed | likely | possible

      category: security

```

**`metadata` fields used by autoscan:**

| Field | Usage |
|-------|-------|
| `owasp` | Stored in `finding["owasp"]`; drives SARIF help URIs and HTML badges |
| `confidence` | Stored in `finding["confidence"]` |
| `category` | Used as `finding["category"]` (`"security"` or `"performance"`) |

---

## Adding a new rule pack

1. Create `myrules.yaml` in the project root.
2. Add a constant and list entry in `rules/__init__.py`:

```python

MYRULES = _ROOT / "myrules.yaml"



ALL_SECURITY = [

    ...

    ("Semgrep:MyRules", MYRULES, "security"),

]

```

`scan_repo()` automatically picks it up β€” no changes to `core/scanner.py` needed.

See [How to Extend](../how-to-extend.md#adding-a-new-semgrep-rule-pack) for the full walkthrough.