# STEM BIO-AI Architecture

This document describes the implemented repository structure and runtime boundaries in `v1.8.0`.

## Purpose

STEM BIO-AI scans a local repository and classifies its observable evidence surface.

It does not execute target training runs, call an LLM in the default scoring path, or assert clinical safety.

## Operating Model

The default path is local and deterministic:

1. read repository files from a local clone
2. extract README, docs, code, CI, and package metadata signals
3. score those signals across Stage 1, Stage 2R, Stage 3, and Stage 4 lanes
4. apply code-integrity penalties and policy caps
5. emit traceable JSON, Markdown, HTML, and PDF artifacts

Optional advisory flows exist as a separate trust boundary and are documented in:

- [ADVISORY_RUNTIME.md](ADVISORY_RUNTIME.md)
- [ADVISORY_SECRET_HANDLING.md](ADVISORY_SECRET_HANDLING.md)
- [API_CONTRACT.md](API_CONTRACT.md)

## High-Level Flow

```text
Target repository
  -> scanner.py
  -> detector_surface.py / detectors.py
  -> detector_ast.py / detector_bio.py / detector_contract.py / detector_stage4.py
  -> evidence.py + policy_intent.py + calibration_profile.py
  -> render.py / render_html.py
  -> JSON / Markdown / HTML / PDF evidence packet
```

## Core Modules

| Module | Responsibility |
|---|---|
| `stem_ai/cli.py` | CLI entry points for `scan`, `gate`, `policy`, and advisory commands |
| `stem_ai/scanner.py` | Repository walk, signal orchestration, score assembly |
| `stem_ai/detectors.py` | Shared detector entry surfaces and signal collection glue |
| `stem_ai/detector_surface.py` | README/docs/package/CI evidence extraction |
| `stem_ai/detector_ast.py` | AST-based code integrity and contract checks |
| `stem_ai/detector_bio.py` | Bio-adjacent deterministic diagnostics and parser-guard lanes |
| `stem_ai/detector_contract.py` | Contract and mismatch detection across advertised and implemented surfaces |
| `stem_ai/detector_stage4.py` | Replication and reproducibility evidence lane |
| `stem_ai/evidence.py` | Trace object construction and proof ledger materialization |
| `stem_ai/render.py` / `stem_ai/render_html.py` | Markdown, PDF, and HTML report generation |
| `stem_ai/calibration_profile.py` | Versioned policy profile loading and preview/simulation support |
| `stem_ai/policy_intent.py` | Governed profile-derivation and policy-intent handling |
| `stem_ai/advisory_*` | Provider-neutral advisory packet/export/validation boundary |

## Score Construction

The canonical score path is documented in [SCORING_RATIONALE.md](SCORING_RATIONALE.md). At a high level:

- Stage 1 measures README evidence and hype/responsibility signals
- Stage 2R measures repo-local consistency across docs, package metadata, CI, and tests
- Stage 3 measures code/bio responsibility and integrity-adjacent evidence
- Stage 4 reports replication evidence as a separate lane
- C1-C6 penalties and caps apply bounded deductions or floors

The architecture preserves a strict rule: preview policy simulations must not silently rewrite the authoritative deterministic score.

## Trust Boundaries

### Default deterministic boundary

- local repository only
- no required network access
- no LLM in the scoring loop
- evidence must point to a concrete file, line, pattern, or artifact

### Advisory boundary

- entered only through explicit advisory commands
- packet export, provider call intent, and response validation are separated
- secret handling and claim filters are governed independently of the deterministic score path

### Output boundary

- reports are review aids
- tiers classify observable evidence posture
- reports are not clinical validation, regulatory approval, or deployment certification

## Failure And Fallback Behavior

- missing optional extras reduce output capabilities rather than rewrite the score semantics
- PDF generation depends on the `pdf` extra; JSON/Markdown paths remain available without it
- advisory provider execution is intentionally separate from deterministic scoring
- policy simulation can preview alternate postures without mutating the canonical scoring profile

## Verification Surface

The repository exposes these concrete verification paths:

```bash
pip install -e ".[pdf]"
python -m py_compile stem_ai/cli.py stem_ai/scanner.py stem_ai/render.py stem_ai/app.py
stem --help
python -m stem_ai --help
python -m pytest -q
python -m build
```

## Related Documents

- [README.md](../README.md)
- [CLI_REFERENCE.md](CLI_REFERENCE.md)
- [SCORING_RATIONALE.md](SCORING_RATIONALE.md)
- [DETERMINISTIC_DIAGNOSTICS.md](DETERMINISTIC_DIAGNOSTICS.md)
- [API_CONTRACT.md](API_CONTRACT.md)
- [ADVISORY_RUNTIME.md](ADVISORY_RUNTIME.md)