Spaces:

Chris4K
/

autoscan

Running

App Files Files Community

autoscan / docs /api /rules.md

Chris4K

Initial commit v5.0.0.

5248e3b verified 9 days ago

preview code

raw

history blame contribute delete

3.44 kB

API Reference — `rules/`

`rules/init.py`

Semgrep rule pack registry. All constants are Path objects pointing to YAML files in the project root. The ALL_* lists are consumed by core/scanner.py's scan_repo() to build the parallel task list.

Individual path constants

from rules import CORE, WEB, CRYPTO, ML, SECRETS, PERF, LLM

Constant	File	Description
`CORE`	`core.yaml`	Core Python security — subprocess injection, eval, pickle deserialization, unsafe YAML loading
`WEB`	`web.yaml`	Web security — XSS, SSRF, open redirect, path traversal
`CRYPTO`	`crypto.yaml`	Cryptographic failures — weak ciphers, hardcoded keys, insecure RNG
`ML`	`ml.yaml`	ML-specific — unsafe `pickle.load`, `torch.load` without `weights_only`, model-path injection
`SECRETS`	`secrets.yaml`	Secret patterns — API keys, tokens, credentials in code
`PERF`	`perf.yaml`	Performance anti-patterns — list building in loops, `try/except` in loops
`LLM`	`llm.yaml`	LLM/agent security — prompt injection (LLM01), insecure output handling (LLM02), PII in prompts (LLM06)

Aggregated list constants

`ALL_SECURITY`

ALL_SECURITY: List[Tuple[str, Path, str]] = [
    ("Semgrep:Core",    CORE,    "security"),
    ("Semgrep:Web",     WEB,     "security"),
    ("Semgrep:Crypto",  CRYPTO,  "security"),
    ("Semgrep:ML",      ML,      "security"),
    ("Semgrep:Secrets", SECRETS, "security"),
]

Iterated in scan_repo() when run_security=True. Each tuple (label, path, category) produces one semgrep_pack() call per entry.

`ALL_PERFORMANCE`

ALL_PERFORMANCE: List[Tuple[str, Path, str]] = [
    ("Semgrep:Perf", PERF, "performance"),
]

Iterated when run_performance=True.

`ALL_LLM`

ALL_LLM: List[Tuple[str, Path, str]] = [
    ("Semgrep:LLM", LLM, "security"),
]

Iterated when run_llm=True.

Semgrep YAML rule format

Each .yaml file follows the Semgrep rule schema. The metadata block controls how findings are categorized:

rules:
  - id: my-rule-id
    patterns:
      - pattern: |
          dangerous_call($X, ...)
    message: |
      Dangerous call detected. $X may be user-controlled.
    severity: ERROR           # ERROR | WARNING | INFO
    languages: [python]
    metadata:
      owasp:
        - A03:2021-Injection
      confidence: confirmed   # confirmed | likely | possible
      category: security

metadata fields used by autoscan:

Field	Usage
`owasp`	Stored in `finding["owasp"]`; drives SARIF help URIs and HTML badges
`confidence`	Stored in `finding["confidence"]`
`category`	Used as `finding["category"]` (`"security"` or `"performance"`)

Adding a new rule pack

Create myrules.yaml in the project root.
Add a constant and list entry in rules/__init__.py:

MYRULES = _ROOT / "myrules.yaml"

ALL_SECURITY = [
    ...
    ("Semgrep:MyRules", MYRULES, "security"),
]

scan_repo() automatically picks it up — no changes to core/scanner.py needed.

See How to Extend for the full walkthrough.

API Reference — rules/

rules/__init__.py