marklkelly commited on
Commit
810a977
·
verified ·
1 Parent(s): bd4b7b3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -30,7 +30,7 @@ metrics:
30
 
31
  A compact binary classifier for detecting prompt injection and instruction override attacks in text inputs. Based on [`prajjwal1/bert-tiny`](https://huggingface.co/prajjwal1/bert-tiny) (~4.4M parameters), trained using knowledge distillation from [`protectai/deberta-v3-small-prompt-injection-v2`](https://huggingface.co/protectai/deberta-v3-small-prompt-injection-v2) plus hard labels.
32
 
33
- The model is designed for **edge deployment** on [Fastly Compute@Edge](https://www.fastly.com/products/edge-compute) where Python runtimes are unavailable and inference must fit inside a 128 MB memory envelope. The published ONNX artifacts run directly in a Rust WASM binary via [`tract-onnx`](https://github.com/sonos/tract). See the [blog post](#more-information) for a full write-up of the edge deployment stack.
34
 
35
  > **Long input note:** the model uses a custom **head_tail truncation** strategy for inputs longer than 128 tokens. Standard Hugging Face pipeline truncation does not reproduce this. See [Long Input Handling](#long-input-handling) below.
36
 
@@ -335,7 +335,7 @@ Dataset construction used exact SHA-256 deduplication, text-length filtering (8
335
 
336
  This model covers **prompt injection and instruction override** only. A separate jailbreak detection model was trained on `allenai/wildjailbreak`, but is not deployment-ready due to dataset and threshold-calibration issues.
337
 
338
- **Production latency on Fastly Compute@Edge:**
339
 
340
  The Fastly service runs the INT8 ONNX model via `tract-onnx` inside a WASM binary (`wasm32-wasip1`). A structured latency optimisation campaign reduced median elapsed time from 414 ms to 69 ms:
341
 
 
30
 
31
  A compact binary classifier for detecting prompt injection and instruction override attacks in text inputs. Based on [`prajjwal1/bert-tiny`](https://huggingface.co/prajjwal1/bert-tiny) (~4.4M parameters), trained using knowledge distillation from [`protectai/deberta-v3-small-prompt-injection-v2`](https://huggingface.co/protectai/deberta-v3-small-prompt-injection-v2) plus hard labels.
32
 
33
+ The model is designed for **edge deployment** on [Fastly Compute](https://www.fastly.com/products/edge-compute) where Python runtimes are unavailable and inference must fit inside a 128 MB memory envelope. The published ONNX artifacts run directly in a Rust WASM binary via [`tract-onnx`](https://github.com/sonos/tract). See the [blog post](#more-information) for a full write-up of the edge deployment stack.
34
 
35
  > **Long input note:** the model uses a custom **head_tail truncation** strategy for inputs longer than 128 tokens. Standard Hugging Face pipeline truncation does not reproduce this. See [Long Input Handling](#long-input-handling) below.
36
 
 
335
 
336
  This model covers **prompt injection and instruction override** only. A separate jailbreak detection model was trained on `allenai/wildjailbreak`, but is not deployment-ready due to dataset and threshold-calibration issues.
337
 
338
+ **Production latency on Fastly Compute:**
339
 
340
  The Fastly service runs the INT8 ONNX model via `tract-onnx` inside a WASM binary (`wasm32-wasip1`). A structured latency optimisation campaign reduced median elapsed time from 414 ms to 69 ms:
341