marklkelly's picture
Upload folder using huggingface_hub
bd4b7b3 verified

Fastly Compute@Edge Deployment

This directory contains artifacts for deploying bert-tiny-injection-detector on Fastly Compute@Edge using tract-onnx in a Rust WASM service.

Files

File Description
calibrated_thresholds.json Calibrated block and review thresholds for the injection model

calibrated_thresholds.json

{
  "injection": {
    "T_block_at_1pct_FPR": 0.9403,
    "T_review_lower_at_2pct_FPR": 0.8692
  }
}
Threshold Score range Decision
Below T_review score < 0.8692 Allow
Review band 0.8692 ≤ score < 0.9403 Review
At or above T_block score ≥ 0.9403 Block

ONNX requirements for tract-onnx

  • Use onnx/opset11/model.int8.onnx (or model.fp32.onnx for debugging)
  • Opset 11 is required. Opset ≥ 13 uses dynamic Unsqueeze axes that tract cannot resolve statically. The opset-11 graph has only 2 static Unsqueeze nodes.
  • Input tensors must be int64 of shape [1, 128]
  • Apply head_tail truncation before inference for inputs longer than 128 tokens

Memory and latency

Measured on Fastly Compute@Edge (production, service v11: opt-level=3, Wizer pre-init, simd128):

Metric Value
Median inference ~69 ms
Median total service elapsed ~70 ms
p95 total service elapsed ~85 ms
Memory footprint < 128 MB budget

The inference time exceeds the nominal 50 ms Fastly CPU budget by 1.4×. This is WASM overhead — INT8 SIMD paths are not accelerated in the sandbox. The service is functional at this latency. Wizer pre-initialization eliminates the lazy-static init cost (163 ms in earlier versions); the remaining time is pure BERT inference.