V8 cipher-agnostic byte-amplification detector — initial release (2026-05-13)

Browse files

Files changed (6) hide show

LICENSE +191 -0
README.md +200 -0
inference_example.py +136 -0
model.joblib +3 -0
predict.py +164 -0
release-cert.json +162 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,191 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of tracking or improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for describing the origin of the Work and
+      reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Support. While redistributing the Work or
+      Derivative Works thereof, You may accept upon your own behalf the
+      responsibility to provide, and accept charging a fee for, accepting
+      warranty, support, indemnity, or other liability obligations and/or
+      rights consistent with this License. However, in accepting such
+      obligations, You may act only on Your own behalf and on Your sole
+      responsibility, not on behalf of any other Contributor, and only
+      if You agree to indemnify, defend, and hold each Contributor
+      harmless for any liability incurred by, or claims asserted against,
+      such Contributor by reason of your accepting any such warranty
+      or support.
+   END OF TERMS AND CONDITIONS
+   Copyright 2026 NullRabbit Labs Ltd
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+   implied. See the License for the specific language governing
+   permissions and limitations under the License.

README.md ADDED Viewed

	@@ -0,0 +1,200 @@

+---
+license: apache-2.0
+language:
+  - en
+tags:
+  - cybersecurity
+  - blockchain
+  - network-security
+  - validator-security
+  - anomaly-detection
+  - byte-amplification
+  - sui
+  - solana
+  - scikit-learn
+library_name: scikit-learn
+pretty_name: V8 cipher-agnostic byte-amplification detector
+---
+# V8 cipher-agnostic byte-amplification detector
+## What it is
+V8 is a reference detector trained against `corpus_v1.0`–`corpus_v1.5` under NullRabbit's pre-registration discipline. The model is one demonstrable outcome of that methodology; **the methodology is the contribution.**
+This is the work of the substrate paper (in preparation): an iterative leak-surface peeling pattern applied across multiple training cycles, with each cycle pre-registered, audited on close, and retracted in writing when a leak fires. V8 is the cycle that landed `cipher-agnostic-v2` — a manifest of seven byte-amplification features that detect the attack mechanism without relying on any chain-protocol-specific signal. Cross-chain transfer follows from that property: V8 trained on Sui detects Solana byte-amplification attacks at the wire because the wire shape is the same.
+The model itself is a calibrated histogram gradient-boosting classifier (`CalibratedClassifierCV(HistGradientBoostingClassifier, method='isotonic', cv=5)`) over seven features, calibrated for operating-point selection. Single-bundle scoring; not a packet-level streaming detector.
+V8 is published as the data-layer artefact of NullRabbit Labs' research on **autonomous defence for decentralised networks**. The governance layer is published separately (see Related).
+## Architecture
+- Estimator: `CalibratedClassifierCV(HistGradientBoostingClassifier, method='isotonic', cv=5)`.
+- Manifest: `cipher-agnostic-v2` (7 features). See `feature_names` in the joblib payload.
+- Training corpus: 1,972 bundles drawn from `corpus_v1.0`–`corpus_v1.5` (897 attack + 1,075 benign).
+- Fidelity filter: `lab` + `lab-tls-fronted`.
+- Features version: `v1.1`.
+- Seed: 42.
+## Features
+The `cipher-agnostic-v2` manifest names seven features computed from two bundle modalities:
+| Feature | Source modality | Semantics |
+|---|---|---|
+| `resp.req_bytes_max` | `responses.parquet` | Maximum observed request size in the response time-series |
+| `resp.resp_bytes_max` | `responses.parquet` | Maximum observed response size |
+| `resp.amp_ratio_max` | `responses.parquet` | Maximum per-request response:request byte ratio |
+| `resp.amp_ratio_mean` | `responses.parquet` | Mean response:request byte ratio |
+| `resp.amp_ratio_median` | `responses.parquet` | Median response:request byte ratio |
+| `pcap.unique_dst_ports` | `packets.pcap` | Distinct destination TCP ports observed (capped at 5) |
+| `pcap.unique_src_ports` | `packets.pcap` | Distinct source TCP ports observed (capped at 5) |
+**Cipher-agnostic** means the features are computable on encrypted wire traffic from packet sizes, timing, and cardinality — no cleartext payload bytes required. This is what makes the cardinality features pcap-derived and the response features parquet-derived work together at training time on cleartext lab captures and at inference time on TLS-fronted production traffic.
+## Training data
+The training corpus is **proprietary**. The training surface is NullRabbit's archived `corpus_v1.0`–`corpus_v1.10` (and beyond); the model was trained on the subset of v1.0–v1.5 at `fidelity_class ∈ {lab, lab-tls-fronted}`.
+A curated, public sample of the corpus is available on Hugging Face as **[NullRabbit/nr-bundles-public](https://huggingface.co/datasets/NullRabbit/nr-bundles-public)** — 31 bundles spanning seven vulnerability families across Sui and Solana, CC-BY-4.0. The bundle format is open and specified at **[`nr-bundle-spec`](https://github.com/NullRabbitLabs/nr-bundle-spec)** (MIT). External researchers building their own corpus against the spec can reproduce the methodology, retrain V8-class detectors on their own data, and compare against this reference model.
+## Intended use
+- **Reference detector for byte-amplification attacks** on validator-infrastructure JSON-RPC endpoints. Trained on Sui `sui_F10_multi_get_objects_amp` and adjacent primitives; transfers cross-chain to Solana `SOL_F10_multi_get_accounts_amp` at 100% recall in the published cross-chain leave-one-primitive-out evaluation.
+- **Methodology demonstration**: V8 is the worked example of how a pre-registered, audit-disciplined training cycle produces a detector whose limitations are characterised honestly. The card's Load-bearing limitations section is the methodology demonstration; the model is the artefact that supports it.
+- **Reproducibility anchor**: train a parallel detector against your own bundle corpus and compare. The seven-feature manifest is the contract.
+## Load-bearing limitations
+This section is the most important part of the card. Each limitation is anchored in pre-registered evidence and surfaced because it would otherwise become a deployment-time surprise.
+### Phase 1 close-gate scope
+V8 is Phase-1-close-gate-cleared on **`sui_F10_multi_get_objects_amp` at `lab-tls-fronted` fidelity** — extractor-numerical-equivalence between the production extractor (IBSR `collect-payload` mode at post-term loopback vantage) and the offline reference extractor on all seven features, within `PHASE_1_TOLERANCE`. The model-side close-gate (`PHASE_1_SCORE_CLASS_MATCH` per Decision D-025) — which verifies that prediction-class equivalence holds across configuration shifts that move features into and out of the model's training distribution — is **still in flight** as of this card's date. The numerical-equivalence layer is unblocked; the deployment-claim-load-bearing model-side gate is not.
+### Cardinality envelope
+V8's `pcap.unique_*_ports` features are extracted with a cap-at-5 ceiling that aligns the IBSR and offline extractors above five distinct source/destination TCP ports per direction. **Below five distinct ports**, the two extractors diverge by +1 due to IBSR's broader observation coverage (TC-layer control-packet observation plus warmup-window timing). Score interpretation below the envelope is regime-conditional; the close-gate clearance is band-bounded at ≥5-port cardinality.
+### Saturation envelope
+The IBSR extractor's BPF ringbuf saturates at **~80 MB/sec sustained payload** (~3,400 RPCs/sec for F10-class amplification, default 16 MiB ringbuf). Above this rate, feature values under-count along axes the model is most sensitive to (the response-byte-distribution features). Production deployment beyond this saturation envelope will produce regressions in score that look like detection failure but are extraction failure.
+### Out-of-training-distribution attack-shape mis-scoring
+V8's training distribution covered F10 reproducer configurations at `--ids-per-request 5/10/25 --workers 1/2/8 --delay-ms 0`. The model has been observed to score **"benign"** on attack-shape configurations **outside** that distribution — specifically on the `--ids-per-request 1 --workers 1 --delay-ms 500` low-volume regime captured for the Phase 1 close-gate paired bundles. This is the gap that Decision D-025's `PHASE_1_SCORE_CLASS_MATCH` gate exists to close. **V8 is not a universal F10 detector. It is an F10 detector inside its training distribution.**
+### Cross-chain transfer is class-specific
+V8 transfers cleanly cross-chain to **`SOL_F10_multi_get_accounts_amp`** at 100% recall in the published cross-chain leave-one-primitive-out evaluation. **It does not transfer to other Solana classes.** The parallel V14 (`compute_amp` family) and V11 (`rate_limiter_bypass` family) binary detectors achieve **0% recall on SOL_F14** and **0% recall on SOL_P07** respectively when trained Sui-only and evaluated on Solana. Joint training (the multi-class softmax architecture detailed in companion research) is the architecturally-correct fix for those classes; no feature surgery on V8 will produce a model that detects SOL_F14 or SOL_P07.
+### Binary detector — family-specific, not universal
+V8 is a binary detector trained on the **byte-amplification family only** (Sui F10, Solana F10). Attacks from other vulnerability families — reconnaissance (`nmap_slow`), service_misconfig (`ssh_pwauth`, `grafana_anon`), auth_bypass (`admin_rpc_probe`), rate_limiter_bypass (`simulate_compute_flood`) — produce wire shapes V8 does not recognise as attack-shape. V8 will score them "benign". This is **correct behaviour for a family-specific detector**, not a failure mode. Production deployment must compose V8 with parallel family detectors (V9 recon, V10 auth, V11 app-DoS, V13 misconfig, V14 compute-amp) or use the multi-class softmax model published separately at `NullRabbit/multiclass-folded`.
+### Empty-bundle mis-scoring
+V8 was trained on bundles that observed at least some RPC traffic during the capture window. When `responses.parquet` is **missing or zero-rows** (typical for passive-workload bundles like `sui_BENIGN_passive_fullnode` and `solana_BENIGN_validator_passive`), the five `resp.*` features collapse to zero. V8's decision tree doesn't have rules covering that part of feature space and may produce a high attack-score on the all-zero vector. The `predict.py` helper shipped with this model (see How to use) applies a scoreability gate that refuses to predict on zero-rows-or-missing-responses bundles; the gate is the recommended mitigation.
+### Disclosure context
+The training corpus includes bundles for primitives at varying disclosure states. `SOL_F10_multi_get_accounts_amp` is publicly disclosed per [NR-2026-001](https://nullrabbit.ai). Other primitives represent methodology-class findings or are referenced in coordinated-disclosure channels with respective ecosystems. Disclosure-status information travels with the bundles in `nr-bundles-public`; this model card is the inference-layer cross-reference.
+## Evaluation
+- **Training-set decision agreement**: 100% (all 1,972 bundles).
+- **Phase 1 close-gate clearance**: 7/7 features pass numerical-equivalence between production extractor and offline extractor on held-out + multi-window + low-cardinality + paired bundle sub-experiments (band-bounded as documented above).
+- **Cross-chain leave-one-primitive-out**: 100% recall on `SOL_F10_multi_get_accounts_amp` zero-shot from Sui training.
+Full evaluation evidence and audit trail lives in the substrate paper and in the `nr-substrate` working repo's `docs/PHASE-1-CLOSE-GATE-CLEARED-2026-05-06.md` + companion artefacts. The substrate paper is in preparation.
+## How to use
+### Recommended path: `predict.py` (scoreability-gated)
+The repository ships with `predict.py` — a thin scoreability-gated inference helper that wraps the raw estimator with two production-side gates:
+- **Scoreability gate**: refuses to score bundles where `responses.parquet` is missing or zero-rows. V8's training distribution doesn't cover all-zero feature vectors (see "Empty-bundle mis-scoring" in Load-bearing limitations above), so the gate returns an explicit `verdict: "unscoreable"` instead of a spurious attack score on passive-workload bundles.
+- **Feature-coverage gate**: emits a `feature_coverage` flag (`"full"` when raw packets.pcap is present; `"resp_only"` when it isn't) so callers can downweight or ignore predictions where the two cardinality features defaulted to 0.
+```python
+from huggingface_hub import hf_hub_download
+from predict import load_v8, score_bundle
+model_path = hf_hub_download(
+    repo_id="NullRabbit/v8-cipher-agnostic", filename="model.joblib"
+)
+payload = load_v8(model_path)
+record = score_bundle("/path/to/some/bundle_dir", payload)
+if record["verdict"] == "unscoreable":
+    print(f"refused: {record['reason']}")
+else:
+    print(f"V8 score: {record['v8_score']:.4f} ({record['verdict']}, "
+          f"coverage={record['feature_coverage']})")
+```
+`predict.py` depends on the bundle-spec reference parser:
+```
+pip install git+https://github.com/NullRabbitLabs/nr-bundle-spec.git
+```
+For a full worked example that loads a bundle from `nr-bundles-public` via the spec parser, applies the scoreability gate, and renders verdicts on attack + benign + passive-benign samples, see [`inference_example.py`](inference_example.py).
+### Bypassing the gate
+Callers with their own pre-filtering pipeline (or who explicitly want the raw model output) can load the estimator directly:
+```python
+import joblib
+import numpy as np
+payload = joblib.load(model_path)
+model = payload["model"]            # CalibratedClassifierCV
+features = payload["feature_names"] # 7-feature contract
+X = np.array([[...]])               # shape (n_samples, 7)
+score = model.predict_proba(X)[:, 1]
+```
+**This path is the responsibility of the caller.** If you feed an all-zero feature vector to `model.predict_proba`, V8 will return ~0.9977, which is spurious. The scoreability gate exists for exactly that case. See the Load-bearing limitations section.
+## Methodology
+NullRabbit's training cycles follow pre-registration discipline. Each cycle has a design document committed before the trainer runs. Audits run on close against sanity floors, per-feature ablation trails, and falsification holdouts. Where an audit fires, training halts, the design is re-registered, and the prior version is retracted in writing.
+The **iterative leak-surface peeling pattern** is the methodology contribution: detection of a training-time leak (a feature whose discriminative signal turns out to come from a labelling artefact or capture-pipeline asymmetry rather than from the attack mechanism) triggers a corpus delta + re-train + re-audit, with each cycle narrowing the leak surface. V8 is the cycle that landed when the methodology's leak-surface was small enough that the manifest generalised across chains; the cycles before it (V1–V7) closed specific leaks named in the substrate paper's leak-surface appendix.
+The corpus format and family taxonomy are open at `nr-bundle-spec`. The methodology is open (in preparation as the substrate paper). The specific corpus contents beyond `nr-bundles-public` are proprietary.
+## Related
+- **Bundle format spec**: [`nr-bundle-spec`](https://github.com/NullRabbitLabs/nr-bundle-spec) (MIT)
+- **Reference public bundles**: [NullRabbit/nr-bundles-public](https://huggingface.co/datasets/NullRabbit/nr-bundles-public) (CC-BY-4.0)
+- **Earned-autonomy paper** (governance layer for autonomous defence for decentralised networks): [Zenodo DOI 10.5281/zenodo.18406828](https://doi.org/10.5281/zenodo.18406828)
+- **Substrate paper** (data-layer methodology, in preparation)
+- **NullRabbit Labs**: [huggingface.co/NullRabbit](https://huggingface.co/NullRabbit)
+- **Website**: [nullrabbit.ai](https://nullrabbit.ai)
+## Citation
+```bibtex
+@misc{nullrabbit_v8_cipher_agnostic_2026,
+  author       = {NullRabbit},
+  title        = {V8 cipher-agnostic byte-amplification detector},
+  year         = {2026},
+  month        = may,
+  version      = {1},
+  publisher    = {Hugging Face},
+  url          = {https://huggingface.co/NullRabbit/v8-cipher-agnostic},
+  note         = {Reference binary detector for byte-amplification attacks on validator-infrastructure JSON-RPC endpoints. Trained on the bundle v1 corpus specified at nr-bundle-spec v0.1.0; curated public sample at NullRabbit/nr-bundles-public.},
+}
+```
+## Contact
+Research enquiries: simon@nullrabbit.ai
+Spec compliance or format questions — open an issue at [`nr-bundle-spec`](https://github.com/NullRabbitLabs/nr-bundle-spec).

inference_example.py ADDED Viewed

	@@ -0,0 +1,136 @@

+#!/usr/bin/env python3
+# SPDX-License-Identifier: Apache-2.0
+"""V8 cipher-agnostic byte-amplification detector — end-to-end inference example.
+Three-artefact collaboration. This script:
+1. Downloads a bundle from the public NullRabbit/nr-bundles-public dataset
+   on Hugging Face.
+2. Downloads the V8 model and the scoreability-gated inference helper
+   (``predict.py``) from this repository.
+3. Loads the bundle manifest via the bundle-spec reference parser
+   (NullRabbitLabs/nr-bundle-spec, MIT).
+4. Calls ``predict.score_bundle()`` to apply the scoreability gate and
+   produce a verdict.
+A worked demonstration of the **spec → corpus → model** path: bundles on
+disk are conformant with an open spec; the spec's reference parser loads
+them; the scoreability-gated inference helper produces verdicts.
+Dependencies::
+    pip install huggingface_hub pyarrow scikit-learn joblib numpy
+    pip install git+https://github.com/NullRabbitLabs/nr-bundle-spec.git
+Usage::
+    python inference_example.py
+Three bundles are scored: a known-attack (sui_F10_multi_get_objects_amp),
+a known-benign with traffic (sui_BENIGN_reproducer_pipeline), and a
+known-benign without traffic (sui_BENIGN_passive_fullnode) — the third
+demonstrates the scoreability gate refusing to predict on empty bundles.
+"""
+from __future__ import annotations
+import importlib.util
+import sys
+from pathlib import Path
+from huggingface_hub import hf_hub_download, snapshot_download
+# ─── Constants ──────────────────────────────────────────────────────
+V8_MODEL_REPO = "NullRabbit/v8-cipher-agnostic"
+DATASET_REPO = "NullRabbit/nr-bundles-public"
+# Three sample bundles: attack, scoreable benign, unscoreable benign.
+SAMPLES = [
+    ("crp_19d438471fec4229", "sui_F10_multi_get_objects_amp", "attack"),
+    ("crp_8b85da89c4e34d4c", "sui_BENIGN_reproducer_pipeline", "benign"),
+    ("crp_0598afb4d5e44fb9", "sui_BENIGN_passive_fullnode", "benign (passive)"),
+]
+def _load_module(name: str, path: str) -> "object":
+    spec = importlib.util.spec_from_file_location(name, path)
+    module = importlib.util.module_from_spec(spec)  # type: ignore[arg-type]
+    sys.modules[name] = module
+    spec.loader.exec_module(module)  # type: ignore[union-attr]
+    return module
+def main() -> int:
+    print("=== V8 cipher-agnostic byte-amplification detector ===")
+    print(f"  model repo:   {V8_MODEL_REPO}")
+    print(f"  dataset repo: {DATASET_REPO}")
+    print()
+    # Pull the V8 model + predict.py (the scoreability-gated helper).
+    model_path = hf_hub_download(repo_id=V8_MODEL_REPO, filename="model.joblib")
+    predict_path = hf_hub_download(repo_id=V8_MODEL_REPO, filename="predict.py")
+    # Load the helper as a module + load V8 via the helper.
+    predict = _load_module("v8_predict", predict_path)
+    payload = predict.load_v8(model_path)
+    print(f"V8 loaded: {type(payload['model']).__name__}, "
+          f"{len(payload['feature_names'])} features, "
+          f"manifest={payload['manifest_name']!r}")
+    print()
+    # Pull the three sample bundles.
+    dataset_root = Path(snapshot_download(
+        repo_id=DATASET_REPO, repo_type="dataset",
+        allow_patterns=[f"{cid}/*" for cid, _, _ in SAMPLES],
+    ))
+    # Score each via the gated helper.
+    for corpus_id, primitive_id_expected, label in SAMPLES:
+        bundle_dir = dataset_root / corpus_id
+        record = predict.score_bundle(bundle_dir, payload)
+        print(f"--- {corpus_id} ({primitive_id_expected}) ---")
+        print(f"  ground_truth label: {label}")
+        print(f"  verdict:            {record['verdict']}")
+        if record["verdict"] == "unscoreable":
+            print(f"  reason:             {record['reason']}")
+            print(f"  n_responses_rows:   {record.get('n_responses_rows', 0)}")
+        else:
+            print(f"  V8 score:           {record['v8_score']:.4f}")
+            print(f"  feature_coverage:   {record['feature_coverage']}")
+            print(f"  n_responses_rows:   {record['n_responses_rows']}")
+            print(f"  features:")
+            for k, v in record["features"].items():
+                print(f"    {k:<28s} {v:>12.4f}")
+        print()
+    print("=" * 72)
+    print("Notes on V8 deployment")
+    print("=" * 72)
+    print("""
+- predict.score_bundle() is the recommended consumption surface. The
+  scoreability gate refuses to predict on bundles where responses.parquet
+  is missing or zero-rows. Callers who want raw model output without the
+  gate should load model.joblib directly via joblib.load.
+- feature_coverage=resp_only means raw packets.pcap is absent (as in the
+  public nr-bundles-public bundles). V8's two cardinality features default
+  to 0, which under-scores attacks relative to the model's training
+  expectation. For full-coverage inference, produce your own bundles per
+  nr-bundle-spec with raw pcap retained.
+- V8 is a binary detector for the byte-amplification family. Attacks from
+  other vulnerability families (reconnaissance, service_misconfig,
+  auth_bypass, rate_limiter_bypass with simulateTransaction shape) will
+  score "benign" — this is correct behaviour, not a failure. Use the
+  multi-class softmax model NullRabbit/multiclass-folded for unified
+  attack-family detection.
+""".strip())
+    print("=" * 72)
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

model.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4bf2fc158fbc0bf22dbfc4317c11e252c0e0c4706472a6da647077031d7d27b7
+size 3063106

predict.py ADDED Viewed

	@@ -0,0 +1,164 @@

+#!/usr/bin/env python3
+# SPDX-License-Identifier: Apache-2.0
+"""V8 cipher-agnostic byte-amplification detector — scoreability-gated inference.
+This is the **recommended consumption surface** for V8. It wraps the raw
+``CalibratedClassifierCV`` estimator with two production-side gates:
+  1. **Scoreability gate**: refuses to score bundles where
+     ``responses.parquet`` is missing or zero-rows. V8's training
+     distribution doesn't cover all-zero feature vectors and the
+     underlying estimator produces spurious high attack-scores on them
+     (typical for passive-workload bundles where the validator listens
+     without serving RPC). The gate returns an explicit "unscoreable"
+     verdict instead.
+  2. **Feature-coverage gate**: notes when the raw ``packets.pcap`` is
+     absent (as in the public ``nr-bundles-public`` bundles) and emits
+     a coverage flag with the score so callers can downweight or
+     ignore the prediction. V8's two cardinality features default to
+     0 when raw pcap is absent, which under-scores attacks relative
+     to the model's training expectation.
+Callers who want raw model output without these gates should load
+``model.joblib`` directly via ``joblib.load`` — see the "Bypassing the
+gate" section of the model card.
+Usage::
+    from predict import score_bundle, load_v8
+    payload = load_v8("/path/to/model.joblib")  # or from hf_hub_download
+    record = score_bundle("/path/to/some/bundle_dir", payload)
+    if record["verdict"] == "unscoreable":
+        print(f"refused: {record['reason']}")
+    else:
+        print(f"V8 score: {record['v8_score']:.4f} ({record['verdict']})")
+"""
+from __future__ import annotations
+from pathlib import Path
+from typing import Any
+import joblib
+import numpy as np
+import pyarrow.parquet as pq
+# nr-bundle-spec — the reference parser. Pip-install via
+#   pip install git+https://github.com/NullRabbitLabs/nr-bundle-spec.git
+from bundle_spec import BundleManifest
+V8_FEATURES = [
+    "pcap.unique_dst_ports",
+    "pcap.unique_src_ports",
+    "resp.amp_ratio_max",
+    "resp.amp_ratio_mean",
+    "resp.amp_ratio_median",
+    "resp.req_bytes_max",
+    "resp.resp_bytes_max",
+]
+def load_v8(model_path: str | Path) -> dict[str, Any]:
+    """Load the V8 lineage-dict payload from a joblib file."""
+    return joblib.load(model_path)
+def _extract_features(bundle_dir: Path) -> tuple[dict[str, float], int, bool]:
+    """Extract V8 features + diagnostic flags from a bundle.
+    Returns (features, n_responses_rows, has_packets_pcap).
+    """
+    features = {name: 0.0 for name in V8_FEATURES}
+    responses_path = bundle_dir / "responses.parquet"
+    n_resp_rows = 0
+    if responses_path.is_file():
+        table = pq.read_table(responses_path)
+        n_resp_rows = table.num_rows
+        if n_resp_rows > 0:
+            req = table.column("request_size_bytes").to_numpy()
+            resp = table.column("response_size_bytes").to_numpy()
+            features["resp.req_bytes_max"] = float(req.max())
+            features["resp.resp_bytes_max"] = float(resp.max())
+            with np.errstate(divide="ignore", invalid="ignore"):
+                ratios = np.where(req > 0, resp / req, 0.0)
+            features["resp.amp_ratio_max"] = float(ratios.max())
+            features["resp.amp_ratio_mean"] = float(ratios.mean())
+            features["resp.amp_ratio_median"] = float(np.median(ratios))
+    has_packets_pcap = (bundle_dir / "packets.pcap").is_file()
+    # If raw pcap is present, callers can implement the cardinality
+    # feature extraction; this helper does not parse pcaps. The two
+    # pcap.unique_*_ports features stay at 0.0 — emitting a coverage
+    # warning to the caller is the gate's job.
+    return features, n_resp_rows, has_packets_pcap
+def score_bundle(
+    bundle_dir: str | Path, payload: dict[str, Any]
+) -> dict[str, Any]:
+    """Score a bundle through V8, with the scoreability gate applied.
+    Returns a record with:
+      - ``verdict``: one of ``"attack"``, ``"benign"``, ``"unscoreable"``.
+      - ``v8_score``: P(attack) in [0, 1], or ``None`` if unscoreable.
+      - ``reason``: human-readable explanation when unscoreable.
+      - ``feature_coverage``: ``"full"`` or ``"resp_only"`` (raw pcap absent).
+      - ``corpus_id``, ``primitive_id``, ``ground_truth``: from manifest.
+      - ``features``: the 7 feature values as scored (zeros where absent).
+      - ``n_responses_rows``: number of rows in responses.parquet.
+    """
+    bundle_dir = Path(bundle_dir)
+    manifest_path = bundle_dir / "manifest.json"
+    if not manifest_path.is_file():
+        return {
+            "verdict": "unscoreable",
+            "reason": f"manifest.json not found at {manifest_path}",
+            "v8_score": None,
+        }
+    manifest = BundleManifest.model_validate_json(manifest_path.read_text())
+    features, n_resp_rows, has_packets_pcap = _extract_features(bundle_dir)
+    # Scoreability gate
+    if n_resp_rows == 0:
+        return {
+            "verdict": "unscoreable",
+            "reason": (
+                "responses.parquet is missing or zero-rows; V8 cannot score "
+                "bundles with no observed RPC traffic. Use a non-amplification-"
+                "family detector for passive-workload bundles, or compose with "
+                "the multi-class softmax model NullRabbit/multiclass-folded."
+            ),
+            "v8_score": None,
+            "corpus_id": manifest.corpus_id,
+            "primitive_id": manifest.primitive_id,
+            "n_responses_rows": 0,
+            "feature_coverage": "none",
+        }
+    # Score
+    X = np.array([[features[name] for name in V8_FEATURES]])
+    score = float(payload["model"].predict_proba(X)[0, 1])
+    verdict = "attack" if score >= 0.5 else "benign"
+    coverage = "full" if has_packets_pcap else "resp_only"
+    return {
+        "verdict": verdict,
+        "v8_score": score,
+        "reason": None,
+        "corpus_id": manifest.corpus_id,
+        "primitive_id": manifest.primitive_id,
+        "ground_truth": (
+            manifest.ground_truth_label.value
+            if hasattr(manifest.ground_truth_label, "value")
+            else str(manifest.ground_truth_label)
+        ),
+        "features": features,
+        "n_responses_rows": n_resp_rows,
+        "feature_coverage": coverage,
+    }

release-cert.json ADDED Viewed

	@@ -0,0 +1,162 @@

+{
+  "audited_at": "2026-05-13T13:53:36Z",
+  "model_repo": "NullRabbit/v8-cipher-agnostic",
+  "checks": [
+    {
+      "check": "joblib_loads",
+      "ok": true
+    },
+    {
+      "check": "lineage_dict_shape",
+      "ok": true
+    },
+    {
+      "check": "predict_py_exists",
+      "ok": true
+    },
+    {
+      "check": "predict_py_compiles",
+      "ok": true
+    },
+    {
+      "check": "inference_example_exists",
+      "ok": true
+    },
+    {
+      "check": "inference_example_compiles",
+      "ok": true
+    },
+    {
+      "check": "apache_2_0_license_text_present",
+      "ok": true
+    },
+    {
+      "check": "readme_cites_cipher_agnostic_v2",
+      "ok": true
+    },
+    {
+      "check": "readme_cites_calibrated_classifier",
+      "ok": true
+    },
+    {
+      "check": "readme_cites_1972_bundles",
+      "ok": true
+    },
+    {
+      "check": "readme_cites_897_attack",
+      "ok": true
+    },
+    {
+      "check": "readme_cites_1075_benign",
+      "ok": true
+    },
+    {
+      "check": "readme_cites_seed_42",
+      "ok": true
+    },
+    {
+      "check": "readme_cites_v1.1_features_version",
+      "ok": true
+    },
+    {
+      "check": "readme_anchor_autonomous_defence",
+      "ok": true
+    },
+    {
+      "check": "readme_anchor_cipher_agnostic",
+      "ok": true
+    },
+    {
+      "check": "readme_anchor_byte_amplification",
+      "ok": true
+    },
+    {
+      "check": "readme_anchor_iterative_leak_surface_peeling",
+      "ok": true
+    },
+    {
+      "check": "readme_substrate_paper_in_preparation",
+      "ok": true
+    },
+    {
+      "check": "readme_zenodo_doi_present",
+      "ok": true
+    },
+    {
+      "check": "readme_cross_links_nr_bundles_public",
+      "ok": true
+    },
+    {
+      "check": "readme_cross_links_nr_bundle_spec",
+      "ok": true
+    },
+    {
+      "check": "readme_has_load_bearing_limitations_section",
+      "ok": true
+    },
+    {
+      "check": "readme_has_cardinality_envelope",
+      "ok": true
+    },
+    {
+      "check": "readme_has_saturation_envelope",
+      "ok": true
+    },
+    {
+      "check": "readme_has_cross_chain_class_specific",
+      "ok": true
+    },
+    {
+      "check": "readme_has_phase_1_close_gate",
+      "ok": true
+    },
+    {
+      "check": "readme_has_family_specific_clarification",
+      "ok": true
+    },
+    {
+      "check": "readme_has_empty_bundle_mis_scoring",
+      "ok": true
+    },
+    {
+      "check": "readme_apache_2_0_license",
+      "ok": true
+    },
+    {
+      "check": "readme_recommends_predict_py",
+      "ok": true
+    },
+    {
+      "check": "readme_has_bypassing_the_gate_section",
+      "ok": true
+    },
+    {
+      "check": "gate_attack_bundle_scores_attack",
+      "ok": true,
+      "verdict": "attack",
+      "v8_score": 0.9976744186046511,
+      "primitive_id": "sui_F10_multi_get_objects_amp"
+    },
+    {
+      "check": "gate_scoreable_benign_scores_benign",
+      "ok": true,
+      "verdict": "benign",
+      "v8_score": 0.0,
+      "primitive_id": "sui_BENIGN_reproducer_pipeline"
+    },
+    {
+      "check": "gate_zero_rows_responses_returns_unscoreable",
+      "ok": true,
+      "verdict": "unscoreable",
+      "reason": "responses.parquet is missing or zero-rows; V8 cannot score bundles with no observed RPC traffic. Use a non-amplification-family detector for passive-workload bundles, or compose with the multi-class softmax model NullRabbit/multiclass-folded."
+    },
+    {
+      "check": "gate_missing_responses_returns_unscoreable",
+      "ok": true,
+      "verdict": "unscoreable"
+    }
+  ],
+  "n_checks": 36,
+  "n_ok": 36,
+  "release_ok": true
+}