# Fastly Compute@Edge Deployment This directory contains artifacts for deploying `bert-tiny-injection-detector` on [Fastly Compute@Edge](https://www.fastly.com/products/edge-compute) using [`tract-onnx`](https://github.com/sonos/tract) in a Rust WASM service. ## Files | File | Description | |---|---| | `calibrated_thresholds.json` | Calibrated block and review thresholds for the injection model | ## calibrated_thresholds.json ```json { "injection": { "T_block_at_1pct_FPR": 0.9403, "T_review_lower_at_2pct_FPR": 0.8692 } } ``` | Threshold | Score range | Decision | |---|---|---| | Below `T_review` | score < 0.8692 | Allow | | Review band | 0.8692 ≤ score < 0.9403 | Review | | At or above `T_block` | score ≥ 0.9403 | Block | ## ONNX requirements for tract-onnx - Use `onnx/opset11/model.int8.onnx` (or `model.fp32.onnx` for debugging) - **Opset 11 is required.** Opset ≥ 13 uses dynamic `Unsqueeze` axes that `tract` cannot resolve statically. The opset-11 graph has only 2 static `Unsqueeze` nodes. - Input tensors must be `int64` of shape `[1, 128]` - Apply `head_tail` truncation before inference for inputs longer than 128 tokens ## Memory and latency Measured on Fastly Compute@Edge (production, service v11: opt-level=3, Wizer pre-init, simd128): | Metric | Value | |---|---| | Median inference | ~69 ms | | Median total service elapsed | ~70 ms | | p95 total service elapsed | ~85 ms | | Memory footprint | < 128 MB budget | The inference time exceeds the nominal 50 ms Fastly CPU budget by ~1.4×. This is WASM overhead — INT8 SIMD paths are not accelerated in the sandbox. The service is functional at this latency. Wizer pre-initialization eliminates the lazy-static init cost (~163 ms in earlier versions); the remaining time is pure BERT inference.