| # Fastly Compute@Edge Deployment |
|
|
| This directory contains artifacts for deploying `bert-tiny-injection-detector` on |
| [Fastly Compute@Edge](https://www.fastly.com/products/edge-compute) using |
| [`tract-onnx`](https://github.com/sonos/tract) in a Rust WASM service. |
|
|
| ## Files |
|
|
| | File | Description | |
| |---|---| |
| | `calibrated_thresholds.json` | Calibrated block and review thresholds for the injection model | |
|
|
| ## calibrated_thresholds.json |
| |
| ```json |
| { |
| "injection": { |
| "T_block_at_1pct_FPR": 0.9403, |
| "T_review_lower_at_2pct_FPR": 0.8692 |
| } |
| } |
| ``` |
| |
| | Threshold | Score range | Decision | |
| |---|---|---| |
| | Below `T_review` | score < 0.8692 | Allow | |
| | Review band | 0.8692 ≤ score < 0.9403 | Review | |
| | At or above `T_block` | score ≥ 0.9403 | Block | |
| |
| ## ONNX requirements for tract-onnx |
| |
| - Use `onnx/opset11/model.int8.onnx` (or `model.fp32.onnx` for debugging) |
| - **Opset 11 is required.** Opset ≥ 13 uses dynamic `Unsqueeze` axes that `tract` cannot |
| resolve statically. The opset-11 graph has only 2 static `Unsqueeze` nodes. |
| - Input tensors must be `int64` of shape `[1, 128]` |
| - Apply `head_tail` truncation before inference for inputs longer than 128 tokens |
| |
| ## Memory and latency |
| |
| Measured on Fastly Compute@Edge (production, service v11: opt-level=3, Wizer pre-init, simd128): |
| |
| | Metric | Value | |
| |---|---| |
| | Median inference | ~69 ms | |
| | Median total service elapsed | ~70 ms | |
| | p95 total service elapsed | ~85 ms | |
| | Memory footprint | < 128 MB budget | |
| |
| The inference time exceeds the nominal 50 ms Fastly CPU budget by ~1.4×. This is WASM |
| overhead — INT8 SIMD paths are not accelerated in the sandbox. The service is functional |
| at this latency. Wizer pre-initialization eliminates the lazy-static init cost (~163 ms |
| in earlier versions); the remaining time is pure BERT inference. |
| |