Fastly Compute@Edge Deployment
This directory contains artifacts for deploying bert-tiny-injection-detector on
Fastly Compute@Edge using
tract-onnx in a Rust WASM service.
Files
| File | Description |
|---|---|
calibrated_thresholds.json |
Calibrated block and review thresholds for the injection model |
calibrated_thresholds.json
{
"injection": {
"T_block_at_1pct_FPR": 0.9403,
"T_review_lower_at_2pct_FPR": 0.8692
}
}
| Threshold | Score range | Decision |
|---|---|---|
Below T_review |
score < 0.8692 | Allow |
| Review band | 0.8692 ≤ score < 0.9403 | Review |
At or above T_block |
score ≥ 0.9403 | Block |
ONNX requirements for tract-onnx
- Use
onnx/opset11/model.int8.onnx(ormodel.fp32.onnxfor debugging) - Opset 11 is required. Opset ≥ 13 uses dynamic
Unsqueezeaxes thattractcannot resolve statically. The opset-11 graph has only 2 staticUnsqueezenodes. - Input tensors must be
int64of shape[1, 128] - Apply
head_tailtruncation before inference for inputs longer than 128 tokens
Memory and latency
Measured on Fastly Compute@Edge (production, service v11: opt-level=3, Wizer pre-init, simd128):
| Metric | Value |
|---|---|
| Median inference | ~69 ms |
| Median total service elapsed | ~70 ms |
| p95 total service elapsed | ~85 ms |
| Memory footprint | < 128 MB budget |
The inference time exceeds the nominal 50 ms Fastly CPU budget by 1.4×. This is WASM
overhead — INT8 SIMD paths are not accelerated in the sandbox. The service is functional
at this latency. Wizer pre-initialization eliminates the lazy-static init cost (163 ms
in earlier versions); the remaining time is pure BERT inference.