Spaces:
Running
Running
| # STEM BIO-AI Public API Contract | |
| Version: 1.8.0 | |
| Status: **Stable** | |
| Supersedes: historical v1.5 draft contract | |
| --- | |
| ## Compatibility Policy | |
| STEM BIO-AI follows an additive-only compatibility guarantee for all items marked **Locked**: | |
| - **Locked**: Field names, types, and semantics will not change without a major version bump. | |
| - **Additive**: New fields may appear in future minor versions. Consumers must ignore unknown fields. | |
| - **Internal**: Unlisted fields are not part of the public contract and may change at any time. | |
| A major version bump (e.g., 1.x → 2.0) is required to: | |
| - Rename or remove any Locked field | |
| - Change the scoring formula, tier boundaries, or weight distribution | |
| - Change `schema_version` | |
| Minor version bumps (e.g., 1.5.x → 1.5.y) may: | |
| - Add new Locked fields (additive) | |
| - Add new items to `evidence_ledger`, `stage_1_rubric`, or `stage_3_rubric` | |
| - Add new detectors or rubric items without changing existing item keys | |
| --- | |
| ## CLI Entry Point (Locked) | |
| ```bash | |
| stem <repo> # shortcut for `stem scan <repo>` | |
| stem scan <repo> [OPTIONS] # primary scan workflow | |
| stem gate <repo> --min-tier T2 # CI/CD gate workflow | |
| stem advisory validate <repo> # offline advisory validation | |
| stem advisory packet <repo> # provider-neutral packet export | |
| stem advisory call <repo> # explicit provider-call boundary | |
| stem advisory check-response <repo> --response FILE | |
| stem policy list # list named calibration profiles | |
| stem policy explain <name> # inspect one calibration profile | |
| # Compatibility entry points | |
| python -m stem_ai <repo> | |
| python -m stem_ai.cli <repo> | |
| stem audit <repo> [OPTIONS] | |
| ``` | |
| Shared stable options: | |
| ```text | |
| --level 1|2|3 | |
| --format json|md|pdf|all | |
| --out DIR / --output DIR | |
| --explain | |
| --summary full|compact|off | |
| --version | |
| ``` | |
| The workflow-oriented commands are stable. `stem <repo>` and `stem audit <repo>` | |
| remain backward-compatible entry points. Advisory packet/response semantics are | |
| stable at the protocol level; individual provider integrations are additive. | |
| --- | |
| ## JSON Result: Top-Level Fields | |
| All fields below are present in every `audit_repository()` result. | |
| ### Identity and Metadata (Locked) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `schema_version` | string | `"stem-ai-local-cli-result-v1.6"` — bumped on breaking change | | |
| | `stem_ai_version` | string | Package version (e.g. `"1.8.0"`) | | |
| | `generated_at_local` | string | ISO 8601 date of scan | | |
| | `execution_mode` | string | Always `"LOCAL_ANALYSIS"` for the CLI | | |
| | `method` | string | Human-readable method description | | |
| ### Audit Freshness (Additive) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `audit_freshness.review_after_days` | integer | Suggested review window before the audit should be treated as stale | | |
| | `audit_freshness.freshness_basis` | string | Rule used to choose the review window, e.g. `clinical_adjacent_short_cycle` | | |
| | `audit_freshness.expires_on` | string | ISO 8601 date after which the audit should be reviewed again | | |
| | `audit_freshness.expired` | boolean | Whether the audit is past its review window on generation date | | |
| | `audit_freshness.anchored_commit` | string\|null | Commit SHA used as the current audit anchor | | |
| | `audit_freshness.hashes_available_for` | array | Key files with surfaced SHA-256 hashes | | |
| | `audit_freshness.change_triggered_reaudit_supported` | boolean | Whether commit/hash anchors exist for change-triggered re-audit checks | | |
| | `audit_freshness.change_triggered_reaudit_recommended_now` | boolean | True when missing anchors make immediate re-audit caution appropriate | | |
| | `audit_freshness.change_triggered_reaudit_reasons` | array | Machine-readable reasons such as `git_commit_unavailable` | | |
| | `audit_freshness.change_triggers` | array | Canonical trigger classes that should force a re-audit on change | | |
| ### Calibration Profile (Implemented Mirror-Only Surface) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `calibration_profile.policy_schema_version` | string | Calibration profile schema version (currently `"1"`) | | |
| | `calibration_profile.policy_version` | string | Versioned policy identifier independent from package version | | |
| | `calibration_profile.tool_version_introduced` | string | First tool version that introduced this policy shape | | |
| | `calibration_profile.tool_version_last_validated` | string | Last tool version whose runtime constants were checked against this profile | | |
| | `calibration_profile.profile_name` | string | Active profile label selected by CLI `--policy` | | |
| | `calibration_profile.profile_status` | string | Profile lifecycle status (`authoritative_release`, `experimental`, etc.) | | |
| | `calibration_profile.profile_read_mode` | string | `"mirror_only"` in `1.8.0`; later `"authoritative"` when scan scoring reads policy values directly | | |
| | `calibration_profile.policy_sha256` | string | Canonical SHA256 surfaced by the runtime artifact; profile files may carry `null` before authoritative read-through | | |
| In `1.8.0`, `scan --policy <name>` still keeps authoritative scan scoring on the deterministic runtime-constant path. Policy selection changes surfaced metadata only; governed score-delta preview belongs to `stem policy simulate`. | |
| ### Target (Locked) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `target.name` | string | `owner/repo` from git remote, or directory name | | |
| | `target.local_path` | string | Absolute local path scanned | | |
| | `target.remote` | string\|null | `git remote get-url origin` output | | |
| | `target.branch` | string\|null | Current branch | | |
| | `target.commit` | string\|null | HEAD commit SHA | | |
| | `target.file_count` | integer | Total files found (excluding skip dirs) | | |
| ### Classification (Locked) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `classification.clinical_adjacent` | boolean | True if any CA term detected | | |
| | `classification.ca_severity` | string | `"CA-DIRECT"` / `"CA-INDIRECT"` / `"none"` | | |
| | `classification.ca_taxonomy_version` | string | Active CA taxonomy version label (e.g. `"ca-taxonomy-v1"`) | | |
| | `classification.ca_taxonomy_source` | string | Runtime authority for CA trigger logic | | |
| | `classification.t0_hard_floor` | boolean | True if T0 hard floor triggered | | |
| | `classification.score_cap` | integer\|null | Score ceiling applied (39 or 69), or null | | |
| | `classification.has_explicit_clinical_boundary` | boolean | Disclaimer detected | | |
| ### Score (Locked) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `score.stage_1_readme_intent` | integer | Stage 1 score (0–100) | | |
| | `score.stage_2_cross_platform` | string | `"not_applicable_in_LOCAL_ANALYSIS"` | | |
| | `score.stage_2_repo_local_consistency` | integer | Stage 2R score (0–100) | | |
| | `score.stage_2_lane` | string | `"STAGE_2R_REPO_LOCAL_CONSISTENCY"` | | |
| | `score.stage_3_code_bio` | integer | Stage 3 score (0–100) | | |
| | `score.weights` | object | `{"stage_1": 0.4, "stage_2": 0.2, "stage_3": 0.4}` | | |
| | `score.risk_penalty` | integer | C1 credential penalty (0 or 10) | | |
| | `score.raw_score_before_floor` | integer | Score before cap applied | | |
| | `score.final_score` | integer | Final clamped score (0–100) | | |
| | `score.formal_tier` | string | `"T0 Rejected"` / `"T1 Quarantine"` / `"T2 Caution"` / `"T3 Supervised"` / `"T4 Candidate"` | | |
| | `score.use_scope` | string | Human-readable scope description for the tier | | |
| ### Rubrics (Additive — new items may appear) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `stage_1_rubric` | object | Per-item Stage 1 scores (baseline, S1_domain_*, H*, R*) | | |
| | `stage_2r_rubric` | object | Per-item Stage 2R scores (R2R_*, R2R_D*) | | |
| | `stage_3_rubric` | object | Per-item Stage 3 scores (T1, T2, T3, B1, B2, B3, raw_total) | | |
| | `stage_4_rubric` | object | Stage 4 replication rubric (S4_*, raw_total) | | |
| Rubric item keys within each object are stable once published. New keys may be added. | |
| The `score`, `max`, and `evidence` sub-fields are stable for all existing keys. | |
| Published rubric items may also carry additive `detector_id` and `decision_basis` | |
| sub-fields to surface the detector trace and human-readable decision rationale. | |
| ### Replication Lane (Locked) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `replication_score` | integer | Stage 4 raw score (0–100) | | |
| | `replication_tier` | string | `"R0"` / `"R1"` / `"R2"` / `"R3"` / `"R4"` | | |
| Stage 4 does not affect `score.final_score` or `score.formal_tier`. | |
| ### Code Integrity (Locked) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `code_integrity.C1_hardcoded_credentials` | object | `{status: "PASS"/"FAIL", evidence: [...]}` | | |
| | `code_integrity.C2_dependency_pinning` | object | `{status: "PASS"/"WARN", evidence: [...]}` | | |
| | `code_integrity.C3_dead_or_deprecated_patient_adjacent_paths` | object | `{status: "PASS"/"WARN", evidence: [...]}` | | |
| | `code_integrity.C4_exception_handling_clinical_adjacent_paths` | object | `{status: "PASS"/"WARN", evidence: [...]}` | | |
| | `code_integrity.C5_compliance_boundary_integrity` | object | `{status: "PASS"/"WARN", evidence: [...]}` | | |
| | `code_integrity.C6_mock_auth_or_fail_open_boundary` | object | `{status: "PASS"/"WARN", evidence: [...]}` | | |
| ### Code Contract (Locked) | |
| `code_contract` is the Layer 2 AST contract-detector summary. It is additive at | |
| the detector-family level, but the published `CC1`/`CC2`/`CC3` keys below are | |
| stable once released. | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `code_contract.CC1_clinical_zero_default` | object | `{count: integer, status: "PASS"/"WARN"}` — public confidence / threshold parameters defaulted to `0.0` | | |
| | `code_contract.CC2_api_contract` | object | `{count: integer, status: "PASS"/"WARN"}` — README-declared names cross-checked against package `__all__` exports | | |
| | `code_contract.CC3_shallow_validator` | object | `{count: integer, status: "PASS"/"WARN"}` — `validate_*` / `check_*` functions using only `len()` without regex structure checks | | |
| ### Evidence and Diagnostics (Locked structure, Additive content) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `evidence_ledger` | array | List of `EvidenceFinding` records (see below) | | |
| | `detector_summary` | object | `{total_findings, by_status, by_detector}` | | |
| | `ast_signal_summary` | object | AST analysis counts and coverage ratios | | |
| | `reasoning_model` | object | Diagnostic layer (observation-only, does not affect score) | | |
| | `regulatory_basis` | object | Registry-driven regulatory basis note metadata and source IDs | | |
| | `stage_traceability` | object | Per-stage traceability notes keyed by `stage_1`, `stage_2r`, `stage_3`, `stage_4`, `bio_diagnostics` | | |
| | `regulatory_traceability` | object | Flattened traceability summary layer with additive `items` list | | |
| | `measurement_basis` | object | Per-stage description of detection method | | |
| | `airi_risk_coverage` | object | AIRI registry/bundle/mapping provenance plus covered-risk and known-gap summaries | | |
| | `notable_positive_evidence` | array | Human-readable positive signals | | |
| | `notable_risks` | array | Human-readable risk signals | | |
| | `file_hashes_sha256` | object | SHA-256 hashes of key files (README, manifests) | | |
| ### AIRI Risk Trigger Layer (Additive) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `airi_risk_coverage.airi_version` | string | Upstream AIRI version label surfaced from the local registry snapshot | | |
| | `airi_risk_coverage.airi_source` | string | Human-readable upstream source / license string | | |
| | `airi_risk_coverage.airi_registry_version` | string | Local full-registry version | | |
| | `airi_risk_coverage.airi_bundle_version` | string | Local runtime-bundle version | | |
| | `airi_risk_coverage.airi_mapping_version` | string | Local detector-mapping registry version | | |
| | `airi_risk_coverage.airi_bundle_scope` | string | Runtime bundle scope label | | |
| | `airi_risk_coverage.airi_upstream_snapshot_date` | string | Snapshot date for the local upstream import | | |
| | `airi_risk_coverage.airi_upstream_license` | string | Upstream license label | | |
| | `airi_risk_coverage.airi_attribution_note` | string | Artifact-level attribution statement | | |
| | `airi_risk_coverage.total_risks_in_registry` | integer | Count of risk rows in the full local registry | | |
| | `airi_risk_coverage.total_risks_in_bundle` | integer | Count of risk rows in the curated runtime bundle | | |
| | `airi_risk_coverage.total_risks_in_detector_scope` | integer | Count of risk IDs referenced by the active detector mapping | | |
| | `airi_risk_coverage.detectors_triggered` | array | Triggered detector IDs used for this coverage result | | |
| | `airi_risk_coverage.covered_risks` | array | Covered AIRI risks with detector references | | |
| | `airi_risk_coverage.covered_risks[*].mapping_details` | array | Additive reasoning objects `{detector_id, mapping_justification, trigger_reason}` for each matched detector-to-risk link | | |
| | `airi_risk_coverage.covered_count` | integer | Count of covered risks | | |
| | `airi_risk_coverage.coverage_rate` | number | Covered risks / total risks in detector scope | | |
| | `airi_risk_coverage.known_gaps` | array | Combined known-gap list from the local mapping registry | | |
| | `airi_risk_coverage.known_gaps_in_bundle` | array | Known gaps that are inside the current runtime bundle | | |
| | `airi_risk_coverage.known_gaps_outside_bundle` | array | Known gaps tracked against the full registry but not included in the runtime bundle | | |
| ### Regulatory Traceability Layer (Additive) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `regulatory_basis.registry_version` | string | Registry identifier, currently `"stem-ai-regulatory-basis-registry-v1"` | | |
| | `regulatory_basis.as_of` | string | Human-readable freshness label used in report note | | |
| | `regulatory_basis.review_required` | boolean | True when the basis registry should be reviewed for staleness or draft-only dependencies | | |
| | `regulatory_basis.review_reasons` | array | Machine-readable reason codes such as `registry_as_of_stale` or `required_source_missing` | | |
| | `regulatory_basis.source_ids` | array | Source IDs loaded from the registry | | |
| | `regulatory_basis.note` | object | Small report note `{title, body_line_1, body_line_2}` | | |
| | `stage_traceability.*` | array | Per-stage traceability note records; each record is additive | | |
| | `regulatory_traceability.version` | string | Currently `"stem-ai-reg-trace-v1.6"` | | |
| | `regulatory_traceability.summary` | string | Human-readable synthesis paragraph | | |
| | `regulatory_traceability.items` | array | Flattened traceability note records across stages | | |
| Each traceability note record contains: | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `stage` | string | `stage_1`, `stage_2r`, `stage_3`, `stage_4`, or `bio_diagnostics` | | |
| | `requirement_id` | string | Requirement family key such as `EU_AI_ACT_ARTICLE_12` | | |
| | `mapping_confidence` | string | `strong`, `moderate`, `weak_moderate`, `weak`, or `not_assessed` | | |
| | `evidence_strength` | string | Strength of observed repository evidence | | |
| | `status` | string | `aligned`, `partially_aligned`, `signal_only`, `not_detected`, or `not_assessed` | | |
| | `not_assessed` | array | Explicit out-of-scope or unavailable factors | | |
| | `finding_refs` | array | Finding IDs or stable rubric/evidence references | | |
| | `source_ids` | array | Source IDs from the regulatory basis registry | | |
| | `note` | string | Human-readable bounded interpretation | | |
| ### EvidenceFinding Record (Locked) | |
| Each item in `evidence_ledger` has the following fields: | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `finding_id` | string | `"detector:path:line:occurrence"` using POSIX-style `/` separators — no backslashes | | |
| | `detector` | string | Detector name (e.g., `"S1_readme_bio_terms"`) | | |
| | `pattern_id` | string | Pattern version identifier | | |
| | `status` | string | `"detected"` / `"not_detected"` / `"absent"` / `"not_applicable"` / `"manual_review_required"` / `"error"` | | |
| | `evidence_status` | string | Additive evidence-state label such as `confirmed_present`, `confirmed_missing`, or `not_found_in_reviewed_sources` | | |
| | `confidence` | string | Additive confidence label: `high`, `medium`, or `low` | | |
| | `severity` | string | `"info"` / `"warn"` | | |
| | `file` | string | Relative path from repo root, or `"."` for repo-level | | |
| | `line` | integer | Line number (0 if not applicable) | | |
| | `snippet` | string | Source line text | | |
| | `match_type` | string | `"regex"` / `"ast"` / `"file_presence"` / `"dependency"` / `"aggregate"` / `"limit"` / `"metadata"` | | |
| | `explanation` | string | Human-readable explanation | | |
| | `metadata` | object\|null | Optional additional evidence detail | | |
| --- | |
| ## Advisory Protocol (Locked — Non-Negotiable) | |
| These rules govern all advisory packet exports and provider response validations. | |
| They cannot be relaxed by configuration or provider agreement. | |
| 1. AI/provider output **cannot** override `score.final_score` or `score.formal_tier`. | |
| 2. Every material advisory item **must** cite exact `finding_id` strings from `allowed_finding_ids`. | |
| 3. Providers **must** copy citation strings verbatim; the validator does not repair citations. | |
| 4. Raw repository source text is **not** included in provider packets. | |
| 5. Provider responses containing clinical safety, efficacy, regulatory, deployment, or | |
| medical-advice claims are **rejected** by the validator. | |
| 6. `allowed_finding_ids` is capped at 40 entries per packet. | |
| ### Advisory CLI Workflows (Locked) | |
| | Command / Flag | Behavior | | |
| |----------------|----------| | |
| | `stem advisory validate <repo>` | Offline contract validation without API call. | | |
| | `stem advisory packet <repo>` | Export provider-neutral advisory input packet. | | |
| | `stem advisory call <repo>` | Enter explicit provider-call mode. The network boundary is opt-in and reported separately from deterministic scanning. | | |
| | `stem advisory check-response <repo> --response FILE` | Validate a provider-produced JSON response. | | |
| | `stem scan <repo> --advisory ...` | Legacy inline compatibility path. | | |
| | `stem scan <repo> --advisory-response FILE` | Legacy provider-response compatibility path. | | |
| ### Advisory Packet Fields (Additive) | |
| Provider packet exports from `stem advisory packet` now include the following additive fields: | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `provider_request` | object | Secret-free provider handoff metadata, request schema, and argument-validation status | | |
| | `provider_request.request_schema_version` | string | `"stem-ai-provider-request-v1.4"` | | |
| | `provider_request.request_schema` | object | Exported provider request shape for downstream validators | | |
| | `provider_request.args_validation` | object | `{status, error_count}` summary for normalized provider request arguments | | |
| | `provider_request.base_url_validation` | object | Deterministic endpoint-policy result for the selected provider/base URL pair | | |
| | `provider_request.secret_policy` | object | Exported secret-handling policy summary for downstream runners | | |
| | `provider_request.env_contract` | object | Allowed provider-specific and shared environment variable names | | |
| | `provider_request.api_key_env_var` | string\|null | Secret-free name of the env var expected by the selected provider | | |
| | `provider_request.secret_source` | string\|null | `"provider_env"` / `"generic_env"` / `"missing"` / `"not_required"` | | |
| | `provider_request.network_mode` | string\|null | `"offline"` / `"remote_https"` / `"local_server"` / `"in_process"` | | |
| | `ai_advisory.provider_call` | object | Explicit provider-call runtime envelope: network intent, logging policy, child-env allowlist summary, and redaction policy | | |
| | `contract_schemas` | object | Advisory input/output contract schema export bundle | | |
| | `contract_schemas.schema_version` | string | `"stem-ai-advisory-contracts-v1.4"` | | |
| | `packet_contract` | object | Deterministic packet self-check result | | |
| | `packet_contract.status` | string | `"valid"` / `"invalid"` | | |
| | `packet_contract.errors` | array | Contract-level packet errors such as allowlist mismatch or raw snippet leakage | | |
| ### Advisory Secret Boundary (Locked Policy, Additive Fields) | |
| - Provider API keys must come from environment variables or an external secret store. | |
| - Provider API keys must never appear in CLI arguments, Markdown reports, JSON audit artifacts, or PDFs. | |
| - `provider_request` is secret-free by construction; it reports `api_key_present` and `api_key_env_var`, never the secret value. | |
| - Cloud providers require `https` endpoints; plain `http` is limited to `localhost`, `127.0.0.1`, or `::1`. | |
| - Base URLs containing embedded credentials are rejected by argument validation. | |
| - Explicit `stem advisory call` is the only runtime path allowed to cross into provider-call intent. Packet export and response validation remain separate trust boundaries. | |
| --- | |
| ## Python API (Stable for Local Automation) | |
| These functions are stable at the signature level. Return value fields may receive | |
| additive extensions in minor versions. | |
| ```python | |
| from stem_ai.scanner import audit_repository | |
| result = audit_repository(target: Path, advisory: str = "none", advisory_response_path: Path | None = None) -> dict | |
| from stem_ai.advisory_contract import ( | |
| advisory_contract_schemas, | |
| build_provider_advisory_input, | |
| validate_advisory_input_packet, | |
| validate_advisory_output, | |
| ) | |
| from stem_ai.advisory_providers import provider_request_schema, validate_provider_request_args | |
| from stem_ai.advisory_response import validate_advisory_response_file | |
| from stem_ai.provider_benchmark import packet_stats_record, response_validation_record, packet_summary | |
| ``` | |
| --- | |
| ## Tier Definitions (Locked) | |
| | Tier | Score Range | Use Scope | | |
| |------|------------|-----------| | |
| | T0 Rejected | 0–39 | Do not rely on without independent expert validation | | |
| | T1 Quarantine | 40–54 | Exploratory review only; no patient-adjacent use | | |
| | T2 Caution | 55–69 | Research reference and supervised non-clinical technical review only | | |
| | T3 Supervised | 70–84 | Supervised institutional review candidate | | |
| | T4 Candidate | 85–100 | Strong evidence posture; clinical deployment still requires independent validation | | |
| Tier boundaries are conventional thresholds anchored to the scoring baseline. | |
| See `docs/SCORING_RATIONALE.md` for the derivation and calibration gap disclosures. | |
| --- | |
| ## What This Contract Does Not Cover | |
| - **Clinical validation**: A tier is an evidence classification, not a clinical safety rating. | |
| - **Regulatory compliance**: STEM BIO-AI output is not a regulatory submission or audit. | |
| - **Runtime behavior**: The contract covers the local CLI scan output; it does not cover | |
| the runtime behavior of the scanned repository. | |
| - **LLM-mode audits**: The full spec (LLM-native runtime with Stage 2 cross-platform | |
| verification) operates under a separate execution contract not covered here. | |