Remove version label; flatten phrasing
Browse files- ADVERSARIAL.md +11 -10
ADVERSARIAL.md
CHANGED
|
@@ -10,7 +10,7 @@ detector cannot distinguish from genuine, because the two have the same expected
|
|
| 10 |
|
| 11 |
Fit an order-m Markov model to real human coding sequence, generate synthetic sequence from it,
|
| 12 |
and measure whether a k-mer detector separates real human from the order-m synthetic. Sweeping
|
| 13 |
-
both the detector order k and the adversary order m gives the boundary
|
| 14 |
|
| 15 |
| detector | adversary m=0 | m=1 | m=2 | m=3 | m=4 | m=5 |
|
| 16 |
|---|---|---|---|---|---|---|
|
|
@@ -23,9 +23,10 @@ both the detector order k and the adversary order m gives the boundary, measured
|
|
| 23 |
## Result
|
| 24 |
|
| 25 |
Each detector collapses to chance exactly when the adversary reaches its order: the k=2 detector
|
| 26 |
-
breaks at m=1, k=4 at m=3, and k=6 at m=5
|
| 27 |
-
|
| 28 |
-
|
|
|
|
| 29 |
|
| 30 |
## The neural model is not evaded
|
| 31 |
|
|
@@ -41,11 +42,11 @@ separates it from real human across every order, exactly where composition fails
|
|
| 41 |
| 6 | 0.52 | 1.00 |
|
| 42 |
| 7 | 0.52 | 1.00 |
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
because
|
| 47 |
-
|
| 48 |
-
|
| 49 |
|
| 50 |
## Implication for biosecurity screening
|
| 51 |
|
|
@@ -57,7 +58,7 @@ adversary must clear (the k=6 detector forces an order-5 match, which constrains
|
|
| 57 |
than an order-1 match), but it never closes the gap, and higher k costs data and invites
|
| 58 |
overfitting. Detecting an order-(k-1)-matched adversary requires signal that is not in global
|
| 59 |
composition at all: per-position, context-dependent modeling of the kind a neural sequence model
|
| 60 |
-
provides, which is where composition methods stop and learned
|
| 61 |
|
| 62 |
This boundary is a property of the method, not of any particular trained weights, and it applies
|
| 63 |
equally to other composition-based detectors.
|
|
|
|
| 10 |
|
| 11 |
Fit an order-m Markov model to real human coding sequence, generate synthetic sequence from it,
|
| 12 |
and measure whether a k-mer detector separates real human from the order-m synthetic. Sweeping
|
| 13 |
+
both the detector order k and the adversary order m gives the boundary:
|
| 14 |
|
| 15 |
| detector | adversary m=0 | m=1 | m=2 | m=3 | m=4 | m=5 |
|
| 16 |
|---|---|---|---|---|---|---|
|
|
|
|
| 23 |
## Result
|
| 24 |
|
| 25 |
Each detector collapses to chance exactly when the adversary reaches its order: the k=2 detector
|
| 26 |
+
breaks at m=1, k=4 at m=3, and k=6 at m=5, matching the sufficient-statistic account: a detector
|
| 27 |
+
reading k-mer counts cannot separate sequence whose order-(k-1) statistics have been reproduced.
|
| 28 |
+
The hexamer detector this model uses is at chance against an adversary that matches the order-5
|
| 29 |
+
composition of human DNA (AUROC 0.53 at m=5).
|
| 30 |
|
| 31 |
## The neural model is not evaded
|
| 32 |
|
|
|
|
| 42 |
| 6 | 0.52 | 1.00 |
|
| 43 |
| 7 | 0.52 | 1.00 |
|
| 44 |
|
| 45 |
+
At order 5 the hexamer detector is at chance (0.53) while the model separates the same sequences at
|
| 46 |
+
1.00. At order 7, which reproduces every 8-mer frequency of human DNA, the model still scores 0.997,
|
| 47 |
+
because it reads long-range structure (codon-pair grammar, gene organization, motif context) that no
|
| 48 |
+
fixed-order composition encodes. Where composition loses discrimination at high adversary order, the
|
| 49 |
+
model retains it.
|
| 50 |
|
| 51 |
## Implication for biosecurity screening
|
| 52 |
|
|
|
|
| 58 |
than an order-1 match), but it never closes the gap, and higher k costs data and invites
|
| 59 |
overfitting. Detecting an order-(k-1)-matched adversary requires signal that is not in global
|
| 60 |
composition at all: per-position, context-dependent modeling of the kind a neural sequence model
|
| 61 |
+
provides, which is where composition methods stop and a learned model is required.
|
| 62 |
|
| 63 |
This boundary is a property of the method, not of any particular trained weights, and it applies
|
| 64 |
equally to other composition-based detectors.
|