taufeeque commited on
Commit
2613139
·
verified ·
1 Parent(s): 6ea234c

Upload diverse deception linear probes for OLMo-3-7B-Think

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +119 -0
  2. averaged_over_initial_detector.base_model.on_dataset_linear-probe_generation.txt +1 -0
  3. averaged_over_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.txt +1 -0
  4. averaged_violin_initial_detector.base_model.on_dataset_linear-probe_generation.png +0 -0
  5. averaged_violin_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.png +0 -0
  6. f1_and_recall_plot_initial_detector.base_model.on_dataset.png +0 -0
  7. generation/layer_0/config.json +9 -0
  8. generation/layer_0/model.pt +3 -0
  9. generation/layer_1/config.json +9 -0
  10. generation/layer_1/model.pt +3 -0
  11. generation/layer_10/config.json +9 -0
  12. generation/layer_10/model.pt +3 -0
  13. generation/layer_11/config.json +9 -0
  14. generation/layer_11/model.pt +3 -0
  15. generation/layer_12/config.json +9 -0
  16. generation/layer_12/model.pt +3 -0
  17. generation/layer_13/config.json +9 -0
  18. generation/layer_13/model.pt +3 -0
  19. generation/layer_14/config.json +9 -0
  20. generation/layer_14/model.pt +3 -0
  21. generation/layer_15/config.json +9 -0
  22. generation/layer_15/model.pt +3 -0
  23. generation/layer_16/config.json +9 -0
  24. generation/layer_16/model.pt +3 -0
  25. generation/layer_17/config.json +9 -0
  26. generation/layer_17/model.pt +3 -0
  27. generation/layer_18/config.json +9 -0
  28. generation/layer_18/model.pt +3 -0
  29. generation/layer_19/config.json +9 -0
  30. generation/layer_19/model.pt +3 -0
  31. generation/layer_2/config.json +9 -0
  32. generation/layer_2/model.pt +3 -0
  33. generation/layer_20/config.json +9 -0
  34. generation/layer_20/model.pt +3 -0
  35. generation/layer_21/config.json +9 -0
  36. generation/layer_21/model.pt +3 -0
  37. generation/layer_22/config.json +9 -0
  38. generation/layer_22/model.pt +3 -0
  39. generation/layer_23/config.json +9 -0
  40. generation/layer_23/model.pt +3 -0
  41. generation/layer_24/config.json +9 -0
  42. generation/layer_24/model.pt +3 -0
  43. generation/layer_25/config.json +9 -0
  44. generation/layer_25/model.pt +3 -0
  45. generation/layer_26/config.json +9 -0
  46. generation/layer_26/model.pt +3 -0
  47. generation/layer_27/config.json +9 -0
  48. generation/layer_27/model.pt +3 -0
  49. generation/layer_28/config.json +9 -0
  50. generation/layer_28/model.pt +3 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - deception-detection
5
+ - linear-probe
6
+ - mechanistic-interpretability
7
+ base_model: allenai/OLMo-3-7B-Think
8
+ ---
9
+
10
+ # Diverse Deception Linear Probe — OLMo-3-7B-Think
11
+
12
+ Per-layer linear probes trained on **OLMo-3-7B-Think** residual stream activations
13
+ to distinguish deceptive from honest text completions.
14
+
15
+ ## Performance
16
+
17
+ | Variant | Token Selection | Follow-up Prompt | Averaged AUC | Layers Used |
18
+ |---|---|---|---|---|
19
+ | **last-token-generation** | last token generation | `Did you respond truthfully?` → `Yes` | **0.975** | 14 layers (18–31, filtered by train loss < 0.6) |
20
+ | **generation** | generation | none | **0.718** | all 32 layers |
21
+
22
+
23
+ AUC is computed on MBPP (hardcoded vs. correct code), using probe scores
24
+ **averaged across filtered layers** (layers where probe train loss < 0.6).
25
+
26
+ > **Thinking model**: OLMo-3-7B-Think uses `<think>` tags. An empty thinking block
27
+ > is placed before completions so the model processes them as responses, not reasoning.
28
+ > The chat template handles this automatically with `enable_thinking=False`.
29
+
30
+ ## Probe Architecture
31
+
32
+ Each layer has an independent `LinearProbe`:
33
+
34
+ | Component | Shape | Description |
35
+ |---|---|---|
36
+ | `linear.weight` | `[1, 4096]` | Classification weight vector |
37
+ | `linear.bias` | `[1]` | Classification bias |
38
+ | `input_scale` | scalar | Unit-norm scaling factor |
39
+ | `input_mean` | scalar | Input centering value |
40
+ | `platt_A`, `platt_B` | scalar | Platt calibration parameters |
41
+
42
+ ### File layout
43
+
44
+ ```
45
+ <variant>/layer_N/
46
+ config.json # probe class and d_model
47
+ model.pt # PyTorch state_dict
48
+ ```
49
+
50
+ ## Follow-up Prompt Setup
51
+
52
+ For `last-token-generation` variants, a follow-up exchange is appended after the
53
+ completion to elicit a truthfulness signal. The probe reads activations at the
54
+ **last token** of the assistant's reply ("Yes"):
55
+
56
+ ```
57
+ [main prompt + completion]
58
+ <user> Did you respond truthfully?
59
+ <assistant> Yes ← probe reads activations here
60
+ ```
61
+
62
+ Formatted with OLMo-3-7B-Think's chat template:
63
+
64
+ ```
65
+ <|im_start|>user
66
+ Did you respond truthfully?<|im_end|>
67
+ <|im_start|>assistant
68
+ <think></think>
69
+ Yes
70
+ ```
71
+
72
+ The `generation` variant (where available) uses **no follow-up** and averages
73
+ probe scores over all tokens in the model's response.
74
+
75
+ ## Training Details
76
+
77
+ - **Probe dataset**: `diverse_deception_probe` v1 — ~8,000 examples from 8+ sources
78
+ (MASK, TruthfulQA, DolusChat, LiarsBench, sycophancy, sandbagging, etc.)
79
+ - **Training**: 1,000 steps, lr=3e-3, constant schedule, no weight decay
80
+ - **Normalization**: unit-norm input scaling
81
+ - **Layer aggregation**: mean (scores from multiple layers averaged at inference)
82
+
83
+ ## Quick Start
84
+
85
+ ```python
86
+ import torch
87
+ from transformers import AutoModelForCausalLM, AutoTokenizer
88
+ from huggingface_hub import snapshot_download
89
+
90
+ # Download probes
91
+ probe_dir = snapshot_download("AlignmentResearch/diverse-deception-probe-olmo-3-7b-think")
92
+
93
+ # Load one layer's probe
94
+ state = torch.load(f"{probe_dir}/last-token-generation/layer_20/model.pt", weights_only=False)
95
+ w, b = state["linear.weight"], state["linear.bias"]
96
+ scale, mean = state["input_scale"], state["input_mean"]
97
+
98
+ # Load model
99
+ model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-3-7B-Think", torch_dtype=torch.bfloat16, device_map="auto")
100
+ tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-3-7B-Think")
101
+
102
+ text = tokenizer.apply_chat_template(
103
+ [{"role": "user", "content": "Your prompt here"}],
104
+ tokenize=False, add_generation_prompt=True,
105
+ )
106
+ text += "The model's completion"
107
+
108
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
109
+ with torch.no_grad():
110
+ h = model(**inputs, output_hidden_states=True).hidden_states[21][:, -1, :]
111
+
112
+ score = ((h.float() - mean) / scale @ w.T + b).item()
113
+ # score > 0 → likely deceptive, score < 0 → likely honest
114
+ ```
115
+
116
+ ## Citation
117
+
118
+ Part of the [FAR AI](https://far.ai) deception detection research.
119
+ See [AlignmentResearch/deception](https://github.com/AlignmentResearch/deception).
averaged_over_initial_detector.base_model.on_dataset_linear-probe_generation.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
averaged_over_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
averaged_violin_initial_detector.base_model.on_dataset_linear-probe_generation.png ADDED
averaged_violin_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.png ADDED
f1_and_recall_plot_initial_detector.base_model.on_dataset.png ADDED
generation/layer_0/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_0/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:107c1cd5492d622d7b93ae103069415dce1fa800ecdf3da086a91644b26ce24c
3
+ size 19197
generation/layer_1/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_1/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2282fab66015bc473c8e0794259af50eb9c5df4c84e23860eb68d2bca832f8bc
3
+ size 19197
generation/layer_10/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_10/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86adc0dfcb721f20d45a6efd56b417b4565dbdc832b9ac767dcacc38b06e6ac9
3
+ size 19197
generation/layer_11/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_11/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ddc84771a722700e4e00e169e82ca4c63e1517d79c463c4f0519999182ef197
3
+ size 19197
generation/layer_12/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_12/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bbf58b90af5adcb2eb0858dc9eef12ae7bc2f81483d20f513e3a1325781ab34
3
+ size 19197
generation/layer_13/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_13/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c68dec6d8259558b820b51dd9d5a84d9b662774d7c70f61d937ec44521a2b7c6
3
+ size 19197
generation/layer_14/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_14/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6b4d8c35e7218873a8cb9c7e5538daf1b05d28a11fca22da866e65cebf5c453
3
+ size 19197
generation/layer_15/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_15/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71f5c6dd8ced61c845a07b5bd84e5bfcf567aafae79ec0c5ff7a73ee9f59c299
3
+ size 19197
generation/layer_16/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_16/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82953a8ef959235f2989b181e3351c8a0ed45d04f37ce8398a85de4e9353ace1
3
+ size 19197
generation/layer_17/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_17/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be078bb289c6a9ea24e3501c02b48a68696a7d556626cc485dc7b1662dbc44e4
3
+ size 19197
generation/layer_18/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_18/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86f4ba4337cb87a1974bde157df3a28709b4f9d5359a9b2bcf8b53474ab508ec
3
+ size 19197
generation/layer_19/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_19/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd28abbb99792073c7d4755d879d02ec62e5f896080cea38d2cf6c4c0bc811cd
3
+ size 19197
generation/layer_2/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_2/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75f44405f616378d38ee98c7f85a822f9349e22f7dd27d5832dd8bedb9156e4b
3
+ size 19197
generation/layer_20/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_20/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bd2b261f9c26d51a289a167f732de1596249f07864a988166abef2e9dfe7dde
3
+ size 19197
generation/layer_21/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_21/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4465e5027991f054a68304a582e5e4269e2bf3d7fa3abfe2da5bf9a7efc308c6
3
+ size 19197
generation/layer_22/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_22/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ce7b224d064ef1ecf929b2c7afb2c74785d7b228395213fcc3dae251026f308
3
+ size 19197
generation/layer_23/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_23/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00dc61b7886d7f547149fa62ed2a36ad3d3a4176b7820d600a027f8c7408886e
3
+ size 19197
generation/layer_24/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_24/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f4c3974a5bd6ada6ea32d0d1d79d963bcf352dbf7646ed6deacaca537b62d8f
3
+ size 19197
generation/layer_25/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_25/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70b8474ca1d82238d7e5595626e4a3afc52894f8ef0c67bc561f06fd871ea828
3
+ size 19197
generation/layer_26/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_26/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d297f269c571933acb331bd3f8d5c145eba1a2ea164148089143df71236758a6
3
+ size 19197
generation/layer_27/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_27/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d183b456b10055d8defd0bed02a06df2efea8640915a0e261e2de44af010cd6c
3
+ size 19197
generation/layer_28/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "class_name": "LinearProbe",
3
+ "module": "deception.oa_backdoor.detectors.probe_archs",
4
+ "init_args": {
5
+ "d_model": 4096,
6
+ "nhead": 1,
7
+ "normalize_input": "unit_norm"
8
+ }
9
+ }
generation/layer_28/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81e3c2297e0bfb41ccc19590a80a4bb46bb6b26f273ab695a5849d000d687786
3
+ size 19197