Droid Shield - Downgrade (false-positive triage)

Decides whether a scanner-flagged line should stay blocked or is a clear false positive to downgrade.

This repo contains a PEFT LoRA (r=64, alpha=128) for Qwen/Qwen3.6-35B-A3B, used by Droid Shield 2.0. It targets the base text tower and must be applied on top of the base weights; it is not a standalone model.

The contract

This adapter was trained against an exact prompt and I/O schema. Sending anything else may degrade it silently. All three pieces below are part of the trained-model contract.

1. System message

Send the contents of system-prompt.txt verbatim as the system message. The exact bytes matter.

sha256(system-prompt.txt) = 97e31038f32cc8d15bd411103807f3d73ca4ed64900d33fd54182fcaee5d16c5

Caveat: the shipped system prompt intentionally refers to "Droid Shield" because these adapters were trained for Factory's Droid Shield workflow. Customers repurposing the adapter may want different branding or framing; validate any prompt edits because changing the trained contract can change scores.

2. User message

A single JSON object, serialized as pretty-printed JSON with 2-space indentation and non-ASCII preserved (canonical form: JS JSON.stringify(value, null, 2)), with these keys:

extension: the file extension
lines: a small ordered window of source lines
focus_line: the zero-based index of the candidate line within lines

3. Assistant output

Strict JSON, verdict first, no thinking/reasoning text:

{"verdict": "S", "reason": "short natural-language reason grounded in the input"}

verdict is exactly one of:

S: clear safe false positive; warn the user the detection may be a false positive (downgrade it)
B: likely real credential, should-block secret, or ambiguous detection; keep it blocked

Decoding requirements

Deterministic: greedy / temperature = 0 (do_sample = false). The shipped generation_config.json is set to greedy for this reason.
No thinking: the training targets contain no <think> blocks. The bundled chat template defaults to opening a thinking turn, so you must disable it (with the bundled template, pass enable_thinking=False). The assistant turn must start with no open <think> block so the model emits the verdict JSON directly; otherwise the verdict-first contract and the logprob score break.
Constrain the output to the {verdict, reason} object (a JSON-schema or grammar-constrained decode is strongly recommended).
Score: the calibrated signal is P(B) read from the token logprobs at the verdict position, renormalized over the two verdict tokens: P(B) = exp(logprob_B) / (exp(logprob_S) + exp(logprob_B)). Request the top-2 logprobs at that token. Both tasks are ranked by P(B) = "treat as a real secret".

Operating point

The adapter emits a probability; the decision threshold is not baked in. It is selected downstream and frozen on a held-out split:

Safety gate. Downstream selects the threshold by a cost ratio (cleared false-alarms vs missed secrets), chosen on a held-out split and then frozen.

Loading

This is a standard PEFT LoRA (fw_lora_layout: hf_peft_v1) whose target modules match the base text tower, so it loads with any stack that supports PEFT LoRA on top of Qwen/Qwen3.6-35B-A3B (e.g. transformers + peft, vLLM, TGI, Fireworks). Then drive it with the contract above: system-prompt.txt as the system message, the JSON user message, greedy decoding, thinking off. Two things that bite:

The base is a multimodal Qwen3_5MoeForConditionalGeneration arch, so load the full base and apply the adapter on top; the LoRA only touches the language tower.
The runtime needs an arch new enough to know qwen3_5_moe (for example, transformers >= 4.57).

Provenance & license

Base: Qwen/Qwen3.6-35B-A3B (Fireworks training base qwen3p6-35b-a3b).
Training/eval pipeline lives in the factory monorepo under finetune/.
This adapter inherits the base model's license and is published as a Factory artifact.

Downloads last month: 13

Model tree for factoryai/shield-dg-r64-c15

Base model

Qwen/Qwen3.6-35B-A3B

Adapter

(55)

this model