| # API Reference — `core/` | |
| ## `core/__init__.py` — public exports | |
| ```python | |
| from core import ( | |
| # helpers.py | |
| run, jload, write_tmp, relpath, have_binary, | |
| # models.py | |
| make_finding, dedup_findings, sort_findings, | |
| # hf.py | |
| hf_space_to_git, list_user_spaces, comment_on_space, | |
| # bootstrap.py | |
| bootstrap_binaries, | |
| # baseline.py | |
| make_fingerprint, save_baseline, load_baseline, | |
| filter_by_baseline, parse_ignore_file, apply_ignore_rules, | |
| ) | |
| ``` | |
| --- | |
| ## `core/scanner.py` | |
| ### `scan_repo()` | |
| ```python | |
| def scan_repo( | |
| repo_url: str, | |
| hf_token: Optional[str] = None, | |
| deep_history: bool = False, | |
| run_security: bool = True, | |
| run_performance: bool = True, | |
| run_llm: bool = True, | |
| max_workers: int = 8, | |
| progress_cb: Optional[Callable[[float, str], None]] = None, | |
| ) -> Tuple[List[dict], List[str]] | |
| ``` | |
| Clone or copy the target, run all enabled scanners in parallel, return `(findings, log)`. | |
| **Parameters:** | |
| | Parameter | Type | Default | Description | | |
| |-----------|------|---------|-------------| | |
| | `repo_url` | `str` | — | HTTPS URL (HF Space or git repo) or local directory path | | |
| | `hf_token` | `str` | `None` | HF Bearer token for private/gated repos | | |
| | `deep_history` | `bool` | `False` | If `True`, run full `git clone` (no `--depth 1`) and include gitleaks | | |
| | `run_security` | `bool` | `True` | Enable security scanners (Semgrep, bandit, pip-audit, …) | | |
| | `run_performance` | `bool` | `True` | Enable performance scanners (Semgrep:Perf, ruff) | | |
| | `run_llm` | `bool` | `True` | Enable LLM/agent scanners (Semgrep:LLM, agent-audit) | | |
| | `max_workers` | `int` | `8` | Maximum thread-pool workers | | |
| | `progress_cb` | `Callable` | `None` | Called with `(fraction: float, description: str)` as each scanner completes | | |
| **Returns:** `(findings, log)` where `findings` is a deduplicated, sorted `List[dict]` and `log` is a `List[str]` of scanner messages (first entry is the summary `"OK (N unique findings)"`). | |
| **Error handling:** Never raises. Returns `([], [error_message])` on clone failure or invalid target. | |
| **Example:** | |
| ```python | |
| from core.scanner import scan_repo | |
| findings, log = scan_repo( | |
| "https://huggingface.co/spaces/owner/myspace", | |
| run_performance=False, | |
| progress_cb=lambda f, d: print(f"{f:.0%} {d}"), | |
| ) | |
| print(log[0]) # "OK (23 unique findings)" | |
| ``` | |
| --- | |
| ## `core/baseline.py` | |
| ### `make_fingerprint(finding)` | |
| ```python | |
| def make_fingerprint(finding: dict) -> str | |
| ``` | |
| Return a 16-hex-char deterministic fingerprint: | |
| ``` | |
| sha256( tool:rule:file:line:message )[:16] | |
| ``` | |
| ### `save_baseline(findings, path)` | |
| ```python | |
| def save_baseline(findings: List[dict], path: Union[str, Path]) -> None | |
| ``` | |
| Persist fingerprints to a JSON file at `path`. Overwrites if it exists. The JSON format: | |
| ```json | |
| { | |
| "created": "2025-01-01T12:00:00Z", | |
| "scanner_version": "4.0.0", | |
| "fingerprints": ["abc123...", ...] | |
| } | |
| ``` | |
| ### `load_baseline(path)` | |
| ```python | |
| def load_baseline(path: Union[str, Path]) -> Set[str] | |
| ``` | |
| Return the set of fingerprint strings from a saved baseline JSON file. | |
| ### `filter_by_baseline(findings, baseline)` | |
| ```python | |
| def filter_by_baseline( | |
| findings: List[dict], | |
| baseline: Set[str], | |
| ) -> Tuple[List[dict], List[dict]] | |
| ``` | |
| Return `(kept, suppressed)` — findings whose fingerprints are **not** in `baseline` vs those that are. | |
| ### `parse_ignore_file(path)` | |
| ```python | |
| def parse_ignore_file(path: Union[str, Path]) -> List[IgnoreRule] | |
| ``` | |
| Parse a `.hfscanignore` file and return a list of `IgnoreRule` dataclass instances. | |
| **`.hfscanignore` syntax:** | |
| ``` | |
| # comment | |
| tests/ # suppress all findings under tests/ | |
| * rule:B101 # suppress rule everywhere | |
| src/legacy/ severity:INFO # suppress INFO severity under path | |
| src/gen/ rule:B608 # suppress rule under path | |
| ``` | |
| ### `apply_ignore_rules(findings, rules)` | |
| ```python | |
| def apply_ignore_rules( | |
| findings: List[dict], | |
| rules: List[IgnoreRule], | |
| ) -> Tuple[List[dict], int] | |
| ``` | |
| Return `(kept_findings, ignored_count)`. Rules are evaluated in order; first match wins. | |
| ### `IgnoreRule` dataclass | |
| ```python | |
| @dataclass | |
| class IgnoreRule: | |
| path_prefix: str # "" = wildcard (applies everywhere) | |
| rule_id: str # "" = no rule filter | |
| severity: str # "" = no severity filter | |
| ``` | |
| --- | |
| ## `core/models.py` | |
| ### `make_finding()` | |
| ```python | |
| def make_finding( | |
| tool: str, | |
| rule: str, | |
| severity: str, | |
| file: str, | |
| line: int, | |
| message: str, | |
| owasp: Union[str, List[str]], | |
| category: str = "security", | |
| confidence: str = None, | |
| remediation: str = None, | |
| ) -> dict | |
| ``` | |
| Build a normalized finding dict. All scanner runners call this to ensure uniform output shape. | |
| - `confidence`: if `None`, looked up from `TOOL_DEFAULT_CONFIDENCE`; falls back to `"possible"`. | |
| - `remediation`: if `None`, looked up from `report.remediation.REMEDIATION` by `rule`; falls back to `""`. | |
| - `owasp`: `str` is automatically wrapped in a list. | |
| ### `sort_findings(findings)` | |
| ```python | |
| def sort_findings(findings: List[dict]) -> List[dict] | |
| ``` | |
| Sort by `severity` (ERROR < WARNING < INFO) → `confidence` (confirmed < likely < possible) → `file` → `line`. | |
| ### `dedup_findings(findings)` | |
| ```python | |
| def dedup_findings(findings: List[dict]) -> List[dict] | |
| ``` | |
| Remove duplicates keyed on `(tool, file, line, message)`. Preserves first occurrence order. | |
| ### Constants | |
| ```python | |
| SEVERITY_RANK: dict # {"ERROR": 0, "HIGH": 0, "WARNING": 1, ...} | |
| CONFIDENCE_RANK: dict # {"confirmed": 0, "likely": 1, "possible": 2} | |
| TOOL_DEFAULT_CONFIDENCE: dict # per-tool default confidence levels | |
| FORBIDDEN_FILES: list # file names that are always flagged | |
| ``` | |
| --- | |
| ## `core/helpers.py` | |
| ### `run(cmd, cwd=None, timeout=300)` | |
| ```python | |
| def run(cmd: List[str], cwd: str = None, timeout: int = 300) -> Tuple[str, int] | |
| ``` | |
| Run a subprocess. Returns `(stdout_stripped, returncode)`. Never raises. | |
| | Exit code | Meaning | | |
| |-----------|---------| | |
| | Normal | Return code from the process | | |
| | `124` | Timed out (`subprocess.TimeoutExpired`) | | |
| | `127` | Binary not found (`FileNotFoundError`) | | |
| ### `jload(txt)` | |
| ```python | |
| def jload(txt: str) -> Optional[Any] | |
| ``` | |
| Parse JSON from a string. Returns `None` for empty strings or parse errors. | |
| ### `write_tmp(content, suffix=".yaml")` | |
| ```python | |
| def write_tmp(content: str, suffix: str = ".yaml") -> str | |
| ``` | |
| Write `content` to a temp file and return its absolute path. | |
| ### `relpath(base, p)` | |
| ```python | |
| def relpath(base: str, p: str) -> str | |
| ``` | |
| Return `p` relative to `base`. If `p` is not under `base`, return `str(p)` unchanged. | |
| ### `have_binary(name)` | |
| ```python | |
| def have_binary(name: str) -> bool | |
| ``` | |
| Return `True` if `name` is on `PATH` (via `shutil.which`). | |
| --- | |
| ## `core/hf.py` | |
| ### `hf_space_to_git(url, token=None)` | |
| ```python | |
| def hf_space_to_git(url: str, token: str = None) -> Optional[str] | |
| ``` | |
| Convert `https://huggingface.co/spaces/<ns>/<name>` to a git-cloneable URL. Returns `None` for non-HF URLs. If `token` is provided, embeds it as HTTP basic auth (`USER:<token>@`). | |
| ### `list_user_spaces(username, hf_token=None, limit=500)` | |
| ```python | |
| def list_user_spaces( | |
| username: str, | |
| hf_token: str = None, | |
| limit: int = 500, | |
| ) -> Tuple[List[str], str] | |
| ``` | |
| Return `(space_urls, status_message)`. Queries `https://huggingface.co/api/spaces?author=<username>`. Returns `([], error_message)` on HTTP error or network failure. | |
| --- | |
| ## `core/bootstrap.py` | |
| ### `bootstrap_binaries()` | |
| ```python | |
| def bootstrap_binaries() -> dict | |
| ``` | |
| Download `gitleaks` and `hadolint` binaries for the current platform if not already on PATH. Returns a dict with keys `"gitleaks"` and `"hadolint"`, values `"ok"` / `"already installed"` / `"error: ..."`. | |
| Binaries are placed in: | |
| - **Windows**: `<venv>\Scripts\` (next to `python.exe`, so `shutil.which` finds them) | |
| - **macOS/Linux**: `~/.local/bin/` | |
| **Versions:** `GITLEAKS_VERSION = "8.18.4"`, `HADOLINT_VERSION = "2.12.0"` (defined as module constants). | |