Add defensive half: Carbon-8B resists the composition-matched adversary (0.53 -> 1.00 at order 5+)
Browse files- ADVERSARIAL.md +20 -0
ADVERSARIAL.md
CHANGED
|
@@ -27,6 +27,26 @@ breaks at m=1, k=4 at m=3, and k=6 at m=5. The staircase is the sufficient-stati
|
|
| 27 |
made visible. The hexamer detector this model uses is blind to an adversary who matches the
|
| 28 |
order-5 composition of human DNA (AUROC 0.53 at m=5).
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
## Implication for biosecurity screening
|
| 31 |
|
| 32 |
Homology-free, composition-based screening, the family that includes k-mer engineered-DNA
|
|
|
|
| 27 |
made visible. The hexamer detector this model uses is blind to an adversary who matches the
|
| 28 |
order-5 composition of human DNA (AUROC 0.53 at m=5).
|
| 29 |
|
| 30 |
+
## The neural model is not evaded
|
| 31 |
+
|
| 32 |
+
Scoring the same order-m-matched synthetic human with Carbon-8B (zero-shot per-base likelihood)
|
| 33 |
+
separates it from real human across every order, exactly where composition fails:
|
| 34 |
+
|
| 35 |
+
| adversary order m | closed-form k=6 (AUROC) | Carbon-8B (AUROC) |
|
| 36 |
+
|---|---|---|
|
| 37 |
+
| 2 | 0.95 | 1.00 |
|
| 38 |
+
| 3 | 0.77 | 1.00 |
|
| 39 |
+
| 4 | 0.68 | 1.00 |
|
| 40 |
+
| 5 | 0.53 | 1.00 |
|
| 41 |
+
| 6 | 0.52 | 1.00 |
|
| 42 |
+
| 7 | 0.52 | 1.00 |
|
| 43 |
+
|
| 44 |
+
The order-5-matched construct is invisible to the hexamer detector (0.53) and obvious to the model
|
| 45 |
+
(1.00). Even an order-7 match, reproducing every 8-mer frequency of human DNA, is caught at 0.997,
|
| 46 |
+
because the model reads long-range structure, codon-pair grammar, gene organization, and motif
|
| 47 |
+
context, that no fixed-order composition encodes. The model's value here is precisely adversarial
|
| 48 |
+
robustness against the evasion composition cannot resist.
|
| 49 |
+
|
| 50 |
## Implication for biosecurity screening
|
| 51 |
|
| 52 |
Homology-free, composition-based screening, the family that includes k-mer engineered-DNA
|