Spaces:

Chris4K
/

autoscan

Sleeping

App Files Files Community

autoscan / docs /api /rules.md

Chris4K

Initial commit v5.0.0.

5248e3b verified 9 days ago

preview code

raw

history blame contribute delete

3.44 kB

	# API Reference — `rules/`

	## `rules/__init__.py`

	Semgrep rule pack registry. All constants are `Path` objects pointing to YAML files in the project root. The `ALL_*` lists are consumed by `core/scanner.py`'s `scan_repo()` to build the parallel task list.

	---

	## Individual path constants

	```python
	from rules import CORE, WEB, CRYPTO, ML, SECRETS, PERF, LLM
	```

	\| Constant \| File \| Description \|
	\|----------\|------\|-------------\|
	\| `CORE` \| `core.yaml` \| Core Python security — subprocess injection, eval, pickle deserialization, unsafe YAML loading \|
	\| `WEB` \| `web.yaml` \| Web security — XSS, SSRF, open redirect, path traversal \|
	\| `CRYPTO` \| `crypto.yaml` \| Cryptographic failures — weak ciphers, hardcoded keys, insecure RNG \|
	\| `ML` \| `ml.yaml` \| ML-specific — unsafe `pickle.load`, `torch.load` without `weights_only`, model-path injection \|
	\| `SECRETS` \| `secrets.yaml` \| Secret patterns — API keys, tokens, credentials in code \|
	\| `PERF` \| `perf.yaml` \| Performance anti-patterns — list building in loops, `try/except` in loops \|
	\| `LLM` \| `llm.yaml` \| LLM/agent security — prompt injection (LLM01), insecure output handling (LLM02), PII in prompts (LLM06) \|

	---

	## Aggregated list constants

	### `ALL_SECURITY`

	```python
	ALL_SECURITY: List[Tuple[str, Path, str]] = [
	("Semgrep:Core", CORE, "security"),
	("Semgrep:Web", WEB, "security"),
	("Semgrep:Crypto", CRYPTO, "security"),
	("Semgrep:ML", ML, "security"),
	("Semgrep:Secrets", SECRETS, "security"),
	]
	```

	Iterated in `scan_repo()` when `run_security=True`. Each tuple `(label, path, category)` produces one `semgrep_pack()` call per entry.

	### `ALL_PERFORMANCE`

	```python
	ALL_PERFORMANCE: List[Tuple[str, Path, str]] = [
	("Semgrep:Perf", PERF, "performance"),
	]
	```

	Iterated when `run_performance=True`.

	### `ALL_LLM`

	```python
	ALL_LLM: List[Tuple[str, Path, str]] = [
	("Semgrep:LLM", LLM, "security"),
	]
	```

	Iterated when `run_llm=True`.

	---

	## Semgrep YAML rule format

	Each `.yaml` file follows the [Semgrep rule schema](https://semgrep.dev/docs/writing-rules/rule-syntax/). The `metadata` block controls how findings are categorized:

	```yaml
	rules:
	- id: my-rule-id
	patterns:
	- pattern: \|
	dangerous_call($X, ...)
	message: \|
	Dangerous call detected. $X may be user-controlled.
	severity: ERROR # ERROR \| WARNING \| INFO
	languages: [python]
	metadata:
	owasp:
	- A03:2021-Injection
	confidence: confirmed # confirmed \| likely \| possible
	category: security
	```

	`metadata` fields used by autoscan:

	\| Field \| Usage \|
	\|-------\|-------\|
	\| `owasp` \| Stored in `finding["owasp"]`; drives SARIF help URIs and HTML badges \|
	\| `confidence` \| Stored in `finding["confidence"]` \|
	\| `category` \| Used as `finding["category"]` (`"security"` or `"performance"`) \|

	---

	## Adding a new rule pack

	1. Create `myrules.yaml` in the project root.
	2. Add a constant and list entry in `rules/__init__.py`:

	```python
	MYRULES = _ROOT / "myrules.yaml"

	ALL_SECURITY = [
	...
	("Semgrep:MyRules", MYRULES, "security"),
	]
	```

	`scan_repo()` automatically picks it up — no changes to `core/scanner.py` needed.

	See [How to Extend](../how-to-extend.md#adding-a-new-semgrep-rule-pack) for the full walkthrough.