Fastly Compute@Edge Deployment

This directory contains artifacts for deploying bert-tiny-injection-detector on Fastly Compute@Edge using tract-onnx in a Rust WASM service.

Files

File	Description
`calibrated_thresholds.json`	Calibrated block and review thresholds for the injection model

calibrated_thresholds.json

{
  "injection": {
    "T_block_at_1pct_FPR": 0.9403,
    "T_review_lower_at_2pct_FPR": 0.8692
  }
}

Threshold	Score range	Decision
Below `T_review`	score < 0.8692	Allow
Review band	0.8692 ≤ score < 0.9403	Review
At or above `T_block`	score ≥ 0.9403	Block

ONNX requirements for tract-onnx

Use onnx/opset11/model.int8.onnx (or model.fp32.onnx for debugging)
Opset 11 is required. Opset ≥ 13 uses dynamic Unsqueeze axes that tract cannot resolve statically. The opset-11 graph has only 2 static Unsqueeze nodes.
Input tensors must be int64 of shape [1, 128]
Apply head_tail truncation before inference for inputs longer than 128 tokens

Memory and latency

Measured on Fastly Compute@Edge (production, service v11: opt-level=3, Wizer pre-init, simd128):

Metric	Value
Median inference	~69 ms
Median total service elapsed	~70 ms
p95 total service elapsed	~85 ms
Memory footprint	< 128 MB budget

The inference time exceeds the nominal 50 ms Fastly CPU budget by ~~1.4×. This is WASM overhead — INT8 SIMD paths are not accelerated in the sandbox. The service is functional at this latency. Wizer pre-initialization eliminates the lazy-static init cost (~~163 ms in earlier versions); the remaining time is pure BERT inference.