Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -30,7 +30,7 @@ metrics:
|
|
| 30 |
|
| 31 |
A compact binary classifier for detecting prompt injection and instruction override attacks in text inputs. Based on [`prajjwal1/bert-tiny`](https://huggingface.co/prajjwal1/bert-tiny) (~4.4M parameters), trained using knowledge distillation from [`protectai/deberta-v3-small-prompt-injection-v2`](https://huggingface.co/protectai/deberta-v3-small-prompt-injection-v2) plus hard labels.
|
| 32 |
|
| 33 |
-
The model is designed for **edge deployment** on [Fastly Compute
|
| 34 |
|
| 35 |
> **Long input note:** the model uses a custom **head_tail truncation** strategy for inputs longer than 128 tokens. Standard Hugging Face pipeline truncation does not reproduce this. See [Long Input Handling](#long-input-handling) below.
|
| 36 |
|
|
@@ -335,7 +335,7 @@ Dataset construction used exact SHA-256 deduplication, text-length filtering (8
|
|
| 335 |
|
| 336 |
This model covers **prompt injection and instruction override** only. A separate jailbreak detection model was trained on `allenai/wildjailbreak`, but is not deployment-ready due to dataset and threshold-calibration issues.
|
| 337 |
|
| 338 |
-
**Production latency on Fastly Compute
|
| 339 |
|
| 340 |
The Fastly service runs the INT8 ONNX model via `tract-onnx` inside a WASM binary (`wasm32-wasip1`). A structured latency optimisation campaign reduced median elapsed time from 414 ms to 69 ms:
|
| 341 |
|
|
|
|
| 30 |
|
| 31 |
A compact binary classifier for detecting prompt injection and instruction override attacks in text inputs. Based on [`prajjwal1/bert-tiny`](https://huggingface.co/prajjwal1/bert-tiny) (~4.4M parameters), trained using knowledge distillation from [`protectai/deberta-v3-small-prompt-injection-v2`](https://huggingface.co/protectai/deberta-v3-small-prompt-injection-v2) plus hard labels.
|
| 32 |
|
| 33 |
+
The model is designed for **edge deployment** on [Fastly Compute](https://www.fastly.com/products/edge-compute) where Python runtimes are unavailable and inference must fit inside a 128 MB memory envelope. The published ONNX artifacts run directly in a Rust WASM binary via [`tract-onnx`](https://github.com/sonos/tract). See the [blog post](#more-information) for a full write-up of the edge deployment stack.
|
| 34 |
|
| 35 |
> **Long input note:** the model uses a custom **head_tail truncation** strategy for inputs longer than 128 tokens. Standard Hugging Face pipeline truncation does not reproduce this. See [Long Input Handling](#long-input-handling) below.
|
| 36 |
|
|
|
|
| 335 |
|
| 336 |
This model covers **prompt injection and instruction override** only. A separate jailbreak detection model was trained on `allenai/wildjailbreak`, but is not deployment-ready due to dataset and threshold-calibration issues.
|
| 337 |
|
| 338 |
+
**Production latency on Fastly Compute:**
|
| 339 |
|
| 340 |
The Fastly service runs the INT8 ONNX model via `tract-onnx` inside a WASM binary (`wasm32-wasip1`). A structured latency optimisation campaign reduced median elapsed time from 414 ms to 69 ms:
|
| 341 |
|