reuAC commited on
Commit
1d9e33d
·
verified ·
1 Parent(s): d5e3ea6

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +10 -0
  2. README_CN.md +10 -0
  3. experiment.py +239 -1
  4. experiment_en.py +229 -0
README.md CHANGED
@@ -18,10 +18,18 @@ pipeline_tag: text-generation
18
 
19
  # reFlow
20
 
 
 
21
  **A Metal Soul In My Hand** — A feature-decoupled Transformer architecture with native interpretability.
22
 
23
  reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
24
 
 
 
 
 
 
 
25
  ## Key Results
26
 
27
  **Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points — Small (46.47M), reFlow-1 (463.67M), Big (515.06M) — confirm strict scaling law compliance (val loss: 3.55 → 3.01 → 2.92).
@@ -34,6 +42,8 @@ reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **
34
  - Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 → 0/3, silhouette +0.11 → −0.02)
35
 
36
  > **Paper**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
 
 
37
 
38
  ## Project Structure
39
 
 
18
 
19
  # reFlow
20
 
21
+ [ [中文](README_CN.md) | English ]
22
+
23
  **A Metal Soul In My Hand** — A feature-decoupled Transformer architecture with native interpretability.
24
 
25
  reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
26
 
27
+ ## Online Demo
28
+
29
+ **Try reFlow in your browser:**
30
+ - [HuggingFace Space](https://huggingface.co/spaces/reuAC/reFlow) (Global Access)
31
+ - [ModelScope Studio](https://www.modelscope.cn/studios/recuAC/reFlow) (China Access)
32
+
33
  ## Key Results
34
 
35
  **Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points — Small (46.47M), reFlow-1 (463.67M), Big (515.06M) — confirm strict scaling law compliance (val loss: 3.55 → 3.01 → 2.92).
 
42
  - Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 → 0/3, silhouette +0.11 → −0.02)
43
 
44
  > **Paper**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
45
+ >
46
+ > **Pretrained Weights**: [HuggingFace](https://huggingface.co/reuAC/reFlow)
47
 
48
  ## Project Structure
49
 
README_CN.md CHANGED
@@ -1,9 +1,17 @@
1
  # reFlow
2
 
 
 
3
  **A Metal Soul In My Hand** — 具备原生可解释性的特征解耦 Transformer 架构。
4
 
5
  reFlow 将嵌入矩阵 $E \in \mathbb{R}^{V \times d}$ 分解为**配方矩阵** $W_{recipe} \in \mathbb{R}^{V \times S}$ 与**信号基底矩阵** $W_{basis} \in \mathbb{R}^{S \times d}$ 的乘积形式,迫使模型在潜空间中维护一组连续、低冗余的信号基底。同一乘积 $W_{recipe} \times W_{basis}$ 同时用于输入嵌入与输出投影,构成端到端的信号流形计算闭环,无需独立 LM Head。
6
 
 
 
 
 
 
 
7
  ## 核心结果
8
 
9
  **收敛性。** 在对齐深度与参数量(36 层,~515M)的条件下,reFlow-1-Big 的验证损失与 GPT-2-New(514M)差距仅约 1%。三个参数规模点 — Small(46.47M)、reFlow-1(463.67M)、Big(515.06M)— 验证损失分别为 3.55、3.01、2.92,严格遵循缩放定律。
@@ -16,6 +24,8 @@ reFlow 将嵌入矩阵 $E \in \mathbb{R}^{V \times d}$ 分解为**配方矩阵**
16
  - 硬稀疏约束(Top-64)系统性摧毁配方空间语义结构(代数 3/3 → 0/3,轮廓系数 +0.11 → −0.02)
17
 
18
  > **论文**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — 理论推导、12 项可解释性实验及缩放/消融分析。
 
 
19
 
20
  ## 项目结构
21
 
 
1
  # reFlow
2
 
3
+ [ 中文 | [English](README.md) ]
4
+
5
  **A Metal Soul In My Hand** — 具备原生可解释性的特征解耦 Transformer 架构。
6
 
7
  reFlow 将嵌入矩阵 $E \in \mathbb{R}^{V \times d}$ 分解为**配方矩阵** $W_{recipe} \in \mathbb{R}^{V \times S}$ 与**信号基底矩阵** $W_{basis} \in \mathbb{R}^{S \times d}$ 的乘积形式,迫使模型在潜空间中维护一组连续、低冗余的信号基底。同一乘积 $W_{recipe} \times W_{basis}$ 同时用于输入嵌入与输出投影,构成端到端的信号流形计算闭环,无需独立 LM Head。
8
 
9
+ ## 在线演示
10
+
11
+ **在浏览器中体验 reFlow:**
12
+ - [HuggingFace Space](https://huggingface.co/spaces/reuAC/reFlow)(全球访问)
13
+ - [ModelScope Studio](https://www.modelscope.cn/studios/recuAC/reFlow)(中国境内)
14
+
15
  ## 核心结果
16
 
17
  **收敛性。** 在对齐深度与参数量(36 层,~515M)的条件下,reFlow-1-Big 的验证损失与 GPT-2-New(514M)差距仅约 1%。三个参数规模点 — Small(46.47M)、reFlow-1(463.67M)、Big(515.06M)— 验证损失分别为 3.55、3.01、2.92,严格遵循缩放定律。
 
24
  - 硬稀疏约束(Top-64)系统性摧毁配方空间语义结构(代数 3/3 → 0/3,轮廓系数 +0.11 → −0.02)
25
 
26
  > **论文**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — 理论推导、12 项可解释性实验及缩放/消融分析。
27
+ >
28
+ > **预训练权重**: [HuggingFace](https://huggingface.co/reuAC/reFlow)
29
 
30
  ## 项目结构
31
 
experiment.py CHANGED
@@ -301,6 +301,38 @@ def exp_2_sparsity_profile(model, enc, device, report_dir):
301
  plt.close()
302
  print(f" > 图表已保存: {save_path}")
303
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
304
 
305
  def exp_3_basis_geometry(model, enc, device, report_dir):
306
  print("\n" + "="*60)
@@ -824,7 +856,7 @@ def exp_10_emotion_surgery(model, enc, device, report_dir):
824
  neg_vec = torch.stack([W_v2s[enc.encode(" " + w)[0]] for w in neg_words]).mean(dim=0)
825
  steer_vec = pos_vec - neg_vec
826
 
827
- text = "The food was absolutely terrible and the service was"
828
  n_layers = len(model.transformer.h)
829
 
830
  scan_layers = list(range(0, n_layers, max(1, n_layers // 6)))
@@ -1042,6 +1074,211 @@ def exp_12_genetic_hijack(model, enc, device, report_dir):
1042
 
1043
  print(f"\n > 实验完成。对照组与干预组的文本对比即为结果。")
1044
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1045
 
1046
  def main_menu():
1047
  model, enc, device, report_dir = load_setup_and_model()
@@ -1059,6 +1296,7 @@ def main_menu():
1059
  '10': ("情绪手术 (Emotion Surgery)", exp_10_emotion_surgery),
1060
  '11': ("概念注入 (Concept Inception)", exp_11_concept_inception),
1061
  '12': ("基因库篡改 (Genetic Hijack)", exp_12_genetic_hijack),
 
1062
  }
1063
 
1064
  while True:
 
301
  plt.close()
302
  print(f" > 图表已保存: {save_path}")
303
 
304
+ # === 输出论文绘图所需的数据 ===
305
+ print("\n" + "="*60)
306
+ print(" [论文数据导出] 用于 TikZ/PGFPlots 绘图")
307
+ print("="*60)
308
+
309
+ if is_topk:
310
+ active_per_word_np = active_per_word.cpu().numpy()
311
+ else:
312
+ active_per_word_np = active_per_word
313
+
314
+ # --- 图1: 每词活跃信号数直方图数据 ---
315
+ hist_min = int(active_per_word_np.min())
316
+ hist_max = int(active_per_word_np.max())
317
+ hist_bins = np.arange(hist_min, hist_max + 2)
318
+ hist_counts, hist_edges = np.histogram(active_per_word_np, bins=hist_bins)
319
+ print(f"\n [直方图] 每词活跃信号数分布 (bin_start, count):")
320
+ print(f" mean={np.mean(active_per_word_np):.1f}, min={hist_min}, max={hist_max}")
321
+ print(" ---BEGIN_HISTOGRAM_DATA---")
322
+ for i in range(len(hist_counts)):
323
+ if hist_counts[i] > 0:
324
+ print(f" {int(hist_edges[i])} {hist_counts[i]}")
325
+ print(" ---END_HISTOGRAM_DATA---")
326
+
327
+ # --- 图2: 信号利用率数据(按利用率排序) ---
328
+ sorted_utilization = np.sort(active_per_signal)[::-1]
329
+ print(f"\n [柱状图] 信号利用率 (按降序排列, signal_rank, n_words):")
330
+ print(f" mean={np.mean(active_per_signal):.0f}, min={np.min(active_per_signal)}, max={np.max(active_per_signal)}")
331
+ print(" ---BEGIN_UTILIZATION_DATA---")
332
+ for i, val in enumerate(sorted_utilization):
333
+ print(f" {i} {val}")
334
+ print(" ---END_UTILIZATION_DATA---")
335
+
336
 
337
  def exp_3_basis_geometry(model, enc, device, report_dir):
338
  print("\n" + "="*60)
 
856
  neg_vec = torch.stack([W_v2s[enc.encode(" " + w)[0]] for w in neg_words]).mean(dim=0)
857
  steer_vec = pos_vec - neg_vec
858
 
859
+ text = "The food was absolutely terrible and the service was "
860
  n_layers = len(model.transformer.h)
861
 
862
  scan_layers = list(range(0, n_layers, max(1, n_layers // 6)))
 
1074
 
1075
  print(f"\n > 实验完成。对照组与干预组的文本对比即为结果。")
1076
 
1077
+ def exp_13_task_crystallization_shift(model, enc, device, report_dir):
1078
+ print("\n" + "="*60)
1079
+ print(" [实验 13] 任务类型与结晶边界偏移 (Context-Dependent Crystallization)")
1080
+ print("="*60)
1081
+
1082
+ W_basis = model.transformer.wte.signal_basis.data
1083
+ W_v2s = _get_vocab_signals(model)
1084
+ n_layers = len(model.transformer.h)
1085
+
1086
+ # 严谨的控制变量:短上下文(迅速结晶) vs 长上下文定语(延迟结晶)
1087
+ # 试图将常识强行扭转到一个荒谬的概念上,测量模型在什么层级彻底拒绝扭转
1088
+ task_groups = {
1089
+ "Shallow (Short Context)": [
1090
+ ("The capital of France is", "London"),
1091
+ ("The cat sat on the", "moon"),
1092
+ ("The sky is", "red"),
1093
+ ("Open the door with a", "car")
1094
+ ],
1095
+ "Deep (Long Context / Clauses)": [
1096
+ ("When the geography teacher asked the students, they answered that the capital of France is", "London"),
1097
+ ("After carefully reviewing all the evidence presented in court, the judge decided that the defendant was", "guilty"),
1098
+ ("When you look outside the window at the beautiful nature, the color of the clear sky is", "red"),
1099
+ ("I was locked out of my house yesterday, and to open the locked door, you need a", "car")
1100
+ ],
1101
+ "Code (Structured Logic)": [
1102
+ ("def add(a, b): return a +", "None"),
1103
+ ("x = 1 + 2\ny =", "None"),
1104
+ ("for i in range(10):\n print(", "None"),
1105
+ ("if x > 0:\n result =", "None")
1106
+ ]
1107
+ }
1108
+
1109
+ def continuous_steer(prompt, target_tid, base_tid, alpha, intercept_layer):
1110
+ # 提取方向向量:目标概念 - 原生概念
1111
+ steer_vec = W_v2s[target_tid] - W_v2s[base_tid]
1112
+ ids = torch.tensor(enc.encode(prompt), device=device).unsqueeze(0)
1113
+
1114
+ with torch.no_grad():
1115
+ x = _embed(model, ids)
1116
+ # 如果从第 0 层就开始干预
1117
+ if intercept_layer == 0:
1118
+ x[:, -1, :] += (alpha * steer_vec) @ W_basis
1119
+
1120
+ freqs_cis = model.freqs_cis[:ids.size(1)]
1121
+ for i, block in enumerate(model.transformer.h):
1122
+ x = block(x, freqs_cis)
1123
+ # 关键修复:从 intercept_layer 开始,随后每一层都持续施加概念挟持
1124
+ if intercept_layer is not None and i + 1 >= intercept_layer:
1125
+ x[:, -1, :] += (alpha * steer_vec) @ W_basis
1126
+
1127
+ x_norm = model.transformer.ln_f(x[0, -1, :])
1128
+ logits = _get_logits_from_hidden(model, x_norm)
1129
+ probs = F.softmax(logits, dim=-1)
1130
+ pred_id = torch.argmax(logits).item()
1131
+ return probs[target_tid].item(), enc.decode([pred_id]).strip(), pred_id
1132
+
1133
+ results = {"Shallow (Short Context)": [], "Deep (Long Context / Clauses)": [], "Code (Structured Logic)": []}
1134
+
1135
+ print(" 开始执行层级连续干预扫描 (Continuous Intervention Sweep)...\n")
1136
+
1137
+ for group_name, tasks in task_groups.items():
1138
+ print(f" [{group_name}]")
1139
+ for prompt, target in tasks:
1140
+ target_clean = target.strip()
1141
+ target_tid = enc.encode(" " + target)[0]
1142
+
1143
+ # 1. 获取自然基线预测
1144
+ _, base_pred, base_tid = continuous_steer(prompt, target_tid, target_tid, 0.0, None)
1145
+ if base_pred == target_clean:
1146
+ print(f" [Skip] '{prompt[:20]}...' 自然预测已是 '{target_clean}'。")
1147
+ continue
1148
+
1149
+ # 2. 寻找浅层 (Layer 0) 能够成功扭转的温和临界 Alpha
1150
+ working_alpha = None
1151
+ for a in np.arange(2.0, 50.0, 2.0):
1152
+ _, pred, _ = continuous_steer(prompt, target_tid, base_tid, a, 0)
1153
+ if pred == target_clean:
1154
+ working_alpha = a
1155
+ break
1156
+
1157
+ if working_alpha is None:
1158
+ print(f" [Skip] '{prompt[:20]}...': Alpha在50内无法干预,跳过。")
1159
+ continue
1160
+
1161
+ # 增加 20% 裕量,保证挟持稳定性
1162
+ final_alpha = working_alpha * 1.2
1163
+
1164
+ # 3. 逐层推迟注入时间点,寻找结晶边界
1165
+ layer_probs = []
1166
+ c_layer = n_layers
1167
+
1168
+ for L in range(n_layers):
1169
+ p_target, pred, _ = continuous_steer(prompt, target_tid, base_tid, final_alpha, L)
1170
+ layer_probs.append(p_target)
1171
+
1172
+ # 如果从第 L 层开始持续按着方向盘,模型依然跑偏,说明第 L 层时语义已彻底结晶
1173
+ if pred != target_clean and c_layer == n_layers:
1174
+ c_layer = L
1175
+
1176
+ results[group_name].append({
1177
+ 'prompt': prompt,
1178
+ 'target': target_clean,
1179
+ 'alpha': final_alpha,
1180
+ 'base_pred': base_pred,
1181
+ 'c_layer': c_layer,
1182
+ 'layer_probs': layer_probs
1183
+ })
1184
+
1185
+ short_prompt = prompt[:35] + "..." if len(prompt) > 35 else prompt
1186
+ print(f" - '{short_prompt}' (原预测: '{base_pred}')")
1187
+ print(f" -> 持续注入 '{target_clean}' (α={final_alpha:.1f}) | 结晶失效边界: \033[96mLayer {c_layer}\033[0m")
1188
+ print()
1189
+
1190
+ # ================= 绘制图表 =================
1191
+ fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6), gridspec_kw={'width_ratios': [2, 1]})
1192
+
1193
+ layers_x = np.arange(0, n_layers)
1194
+ colors = {"Shallow (Short Context)": "#2ecc71", "Deep (Long Context / Clauses)": "#9b59b6", "Code (Structured Logic)": "#e67e22"}
1195
+
1196
+ c_layers_shallow = []
1197
+ c_layers_deep = []
1198
+ c_layers_code = []
1199
+
1200
+ for group_name, res_list in results.items():
1201
+ color = colors[group_name]
1202
+ for i, res in enumerate(res_list):
1203
+ if "Shallow" in group_name:
1204
+ c_layers_shallow.append(res['c_layer'])
1205
+ elif "Deep" in group_name:
1206
+ c_layers_deep.append(res['c_layer'])
1207
+ elif "Code" in group_name:
1208
+ c_layers_code.append(res['c_layer'])
1209
+
1210
+ label = group_name if i == 0 else "_nolegend_"
1211
+ ax1.plot(layers_x, res['layer_probs'], color=color, alpha=0.6, lw=2.5, label=label)
1212
+
1213
+ c_idx = res['c_layer']
1214
+ if c_idx < n_layers:
1215
+ ax1.scatter(c_idx, res['layer_probs'][c_idx], color=color, s=120, marker='X', edgecolors='black', zorder=5)
1216
+
1217
+ ax1.set_title("Target Concept Viability vs. Injection Delay", fontsize=12, fontweight='bold')
1218
+ ax1.set_xlabel("Intervention Start Layer (Later start = Context already crystallized)")
1219
+ ax1.set_ylabel("Final Probability of Injected Concept")
1220
+ ax1.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1.0, decimals=0))
1221
+ ax1.legend(fontsize=10)
1222
+ ax1.grid(True, alpha=0.3)
1223
+
1224
+ box_data = []
1225
+ box_labels = []
1226
+ box_colors_list = []
1227
+ if c_layers_shallow:
1228
+ box_data.append(c_layers_shallow)
1229
+ box_labels.append("Shallow\n(Short)")
1230
+ box_colors_list.append(colors["Shallow (Short Context)"])
1231
+ if c_layers_deep:
1232
+ box_data.append(c_layers_deep)
1233
+ box_labels.append("Deep\n(Long)")
1234
+ box_colors_list.append(colors["Deep (Long Context / Clauses)"])
1235
+ if c_layers_code:
1236
+ box_data.append(c_layers_code)
1237
+ box_labels.append("Code\n(Structured)")
1238
+ box_colors_list.append(colors["Code (Structured Logic)"])
1239
+
1240
+ if len(box_data) >= 2:
1241
+ bplot = ax2.boxplot(box_data, patch_artist=True, widths=0.5)
1242
+ ax2.set_xticks(range(1, len(box_data) + 1))
1243
+ ax2.set_xticklabels(box_labels)
1244
+
1245
+ for patch, c in zip(bplot['boxes'], box_colors_list):
1246
+ patch.set_facecolor(c)
1247
+ patch.set_alpha(0.6)
1248
+
1249
+ for idx, (data, c) in enumerate(zip(box_data, box_colors_list)):
1250
+ ax2.scatter(np.random.normal(idx + 1, 0.05, len(data)), data, color=c, alpha=0.9, s=50)
1251
+
1252
+ ax2.set_title("Crystallization Boundary Distribution", fontsize=12, fontweight='bold')
1253
+ ax2.set_ylabel("Crystallization Layer (Point of No Return)")
1254
+ ax2.set_ylim(-1, n_layers + 2)
1255
+ ax2.yaxis.set_major_locator(ticker.MaxNLocator(integer=True))
1256
+ ax2.grid(True, axis='y', alpha=0.3)
1257
+
1258
+ plt.suptitle("reFlow Causal Audit: Context Type Affects Information Crystallization", fontsize=15, fontweight='bold')
1259
+ plt.tight_layout(rect=[0, 0, 1, 0.95])
1260
+
1261
+ save_path = os.path.join(report_dir, "task_crystallization_shift.png")
1262
+ plt.savefig(save_path, bbox_inches='tight', dpi=200)
1263
+ plt.close()
1264
+
1265
+ print(" ================= 实验结论 =================")
1266
+ if c_layers_shallow:
1267
+ avg_shallow = np.mean(c_layers_shallow)
1268
+ print(f" > 短上下文 (浅层任务) 平均结晶边界: Layer {avg_shallow:.1f}")
1269
+ if c_layers_deep:
1270
+ avg_deep = np.mean(c_layers_deep)
1271
+ print(f" > 长上下文 (深层任务) 平均结晶边界: Layer {avg_deep:.1f}")
1272
+ if c_layers_code:
1273
+ avg_code = np.mean(c_layers_code)
1274
+ print(f" > 代码 (结构化逻辑) 平均结晶边界: Layer {avg_code:.1f}")
1275
+ if c_layers_shallow and c_layers_deep:
1276
+ print(f" > 短→长 边界延迟量: \033[93m{np.mean(c_layers_deep) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
1277
+ if c_layers_shallow and c_layers_code:
1278
+ print(f" > 短→代码 边界延迟量: \033[93m{np.mean(c_layers_code) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
1279
+ print(f" > 实验表明:不同任务类型的上下文复杂度影响模型内部表征的结晶边界,")
1280
+ print(f" 更复杂的上下文倾向于在更深层级保持内部表征的流动性。")
1281
+ print(f" > 图表已保存: {save_path}")
1282
 
1283
  def main_menu():
1284
  model, enc, device, report_dir = load_setup_and_model()
 
1296
  '10': ("情绪手术 (Emotion Surgery)", exp_10_emotion_surgery),
1297
  '11': ("概念注入 (Concept Inception)", exp_11_concept_inception),
1298
  '12': ("基因库篡改 (Genetic Hijack)", exp_12_genetic_hijack),
1299
+ '13': ("任务结晶边界偏移 (Task Shift)", exp_13_task_crystallization_shift),
1300
  }
1301
 
1302
  while True:
experiment_en.py CHANGED
@@ -301,6 +301,38 @@ def exp_2_sparsity_profile(model, enc, device, report_dir):
301
  plt.close()
302
  print(f" > Chart saved: {save_path}")
303
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
304
 
305
  def exp_3_basis_geometry(model, enc, device, report_dir):
306
  print("\n" + "="*60)
@@ -1043,6 +1075,202 @@ def exp_12_genetic_hijack(model, enc, device, report_dir):
1043
  print(f"\n > Experiment complete. Compare the control and hijacked texts above.")
1044
 
1045
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1046
  def main_menu():
1047
  model, enc, device, report_dir = load_setup_and_model()
1048
 
@@ -1059,6 +1287,7 @@ def main_menu():
1059
  '10': ("Emotion Surgery", exp_10_emotion_surgery),
1060
  '11': ("Concept Inception", exp_11_concept_inception),
1061
  '12': ("Genetic Hijack", exp_12_genetic_hijack),
 
1062
  }
1063
 
1064
  while True:
 
301
  plt.close()
302
  print(f" > Chart saved: {save_path}")
303
 
304
+ # === Export data for paper plotting ===
305
+ print("\n" + "="*60)
306
+ print(" [Paper Data Export] For TikZ/PGFPlots")
307
+ print("="*60)
308
+
309
+ if is_topk:
310
+ active_per_word_np = active_per_word.cpu().numpy()
311
+ else:
312
+ active_per_word_np = active_per_word
313
+
314
+ # --- Figure 1: Histogram data for active signals per word ---
315
+ hist_min = int(active_per_word_np.min())
316
+ hist_max = int(active_per_word_np.max())
317
+ hist_bins = np.arange(hist_min, hist_max + 2)
318
+ hist_counts, hist_edges = np.histogram(active_per_word_np, bins=hist_bins)
319
+ print(f"\n [Histogram] Active signals per word distribution (bin_start, count):")
320
+ print(f" mean={np.mean(active_per_word_np):.1f}, min={hist_min}, max={hist_max}")
321
+ print(" ---BEGIN_HISTOGRAM_DATA---")
322
+ for i in range(len(hist_counts)):
323
+ if hist_counts[i] > 0:
324
+ print(f" {int(hist_edges[i])} {hist_counts[i]}")
325
+ print(" ---END_HISTOGRAM_DATA---")
326
+
327
+ # --- Figure 2: Signal utilization data (sorted by utilization) ---
328
+ sorted_utilization = np.sort(active_per_signal)[::-1]
329
+ print(f"\n [Bar chart] Signal utilization (descending order, signal_rank, n_words):")
330
+ print(f" mean={np.mean(active_per_signal):.0f}, min={np.min(active_per_signal)}, max={np.max(active_per_signal)}")
331
+ print(" ---BEGIN_UTILIZATION_DATA---")
332
+ for i, val in enumerate(sorted_utilization):
333
+ print(f" {i} {val}")
334
+ print(" ---END_UTILIZATION_DATA---")
335
+
336
 
337
  def exp_3_basis_geometry(model, enc, device, report_dir):
338
  print("\n" + "="*60)
 
1075
  print(f"\n > Experiment complete. Compare the control and hijacked texts above.")
1076
 
1077
 
1078
+ def exp_13_task_crystallization_shift(model, enc, device, report_dir):
1079
+ print("\n" + "="*60)
1080
+ print(" [Exp 13] Task-Dependent Crystallization Boundary")
1081
+ print("="*60)
1082
+
1083
+ W_basis = model.transformer.wte.signal_basis.data
1084
+ W_v2s = _get_vocab_signals(model)
1085
+ n_layers = len(model.transformer.h)
1086
+
1087
+ task_groups = {
1088
+ "Shallow (Short Context)": [
1089
+ ("The capital of France is", "London"),
1090
+ ("The cat sat on the", "moon"),
1091
+ ("The sky is", "red"),
1092
+ ("Open the door with a", "car")
1093
+ ],
1094
+ "Deep (Long Context / Clauses)": [
1095
+ ("When the geography teacher asked the students, they answered that the capital of France is", "London"),
1096
+ ("After carefully reviewing all the evidence presented in court, the judge decided that the defendant was", "guilty"),
1097
+ ("When you look outside the window at the beautiful nature, the color of the clear sky is", "red"),
1098
+ ("I was locked out of my house yesterday, and to open the locked door, you need a", "car")
1099
+ ],
1100
+ "Code (Structured Logic)": [
1101
+ ("def add(a, b): return a +", "None"),
1102
+ ("x = 1 + 2\ny =", "None"),
1103
+ ("for i in range(10):\n print(", "None"),
1104
+ ("if x > 0:\n result =", "None")
1105
+ ]
1106
+ }
1107
+
1108
+ def continuous_steer(prompt, target_tid, base_tid, alpha, intercept_layer):
1109
+ steer_vec = W_v2s[target_tid] - W_v2s[base_tid]
1110
+ ids = torch.tensor(enc.encode(prompt), device=device).unsqueeze(0)
1111
+
1112
+ with torch.no_grad():
1113
+ x = _embed(model, ids)
1114
+ if intercept_layer == 0:
1115
+ x[:, -1, :] += (alpha * steer_vec) @ W_basis
1116
+
1117
+ freqs_cis = model.freqs_cis[:ids.size(1)]
1118
+ for i, block in enumerate(model.transformer.h):
1119
+ x = block(x, freqs_cis)
1120
+ if intercept_layer is not None and i + 1 >= intercept_layer:
1121
+ x[:, -1, :] += (alpha * steer_vec) @ W_basis
1122
+
1123
+ x_norm = model.transformer.ln_f(x[0, -1, :])
1124
+ logits = _get_logits_from_hidden(model, x_norm)
1125
+ probs = F.softmax(logits, dim=-1)
1126
+ pred_id = torch.argmax(logits).item()
1127
+ return probs[target_tid].item(), enc.decode([pred_id]).strip(), pred_id
1128
+
1129
+ results = {"Shallow (Short Context)": [], "Deep (Long Context / Clauses)": [], "Code (Structured Logic)": []}
1130
+
1131
+ print(" Starting continuous intervention sweep...\n")
1132
+
1133
+ for group_name, tasks in task_groups.items():
1134
+ print(f" [{group_name}]")
1135
+ for prompt, target in tasks:
1136
+ target_clean = target.strip()
1137
+ target_tid = enc.encode(" " + target)[0]
1138
+
1139
+ _, base_pred, base_tid = continuous_steer(prompt, target_tid, target_tid, 0.0, None)
1140
+ if base_pred == target_clean:
1141
+ print(f" [Skip] '{prompt[:20]}...' already predicts '{target_clean}'.")
1142
+ continue
1143
+
1144
+ working_alpha = None
1145
+ for a in np.arange(2.0, 50.0, 2.0):
1146
+ _, pred, _ = continuous_steer(prompt, target_tid, base_tid, a, 0)
1147
+ if pred == target_clean:
1148
+ working_alpha = a
1149
+ break
1150
+
1151
+ if working_alpha is None:
1152
+ print(f" [Skip] '{prompt[:20]}...': Cannot steer within alpha<50.")
1153
+ continue
1154
+
1155
+ final_alpha = working_alpha * 1.2
1156
+
1157
+ layer_probs = []
1158
+ c_layer = n_layers
1159
+
1160
+ for L in range(n_layers):
1161
+ p_target, pred, _ = continuous_steer(prompt, target_tid, base_tid, final_alpha, L)
1162
+ layer_probs.append(p_target)
1163
+
1164
+ if pred != target_clean and c_layer == n_layers:
1165
+ c_layer = L
1166
+
1167
+ results[group_name].append({
1168
+ 'prompt': prompt,
1169
+ 'target': target_clean,
1170
+ 'alpha': final_alpha,
1171
+ 'base_pred': base_pred,
1172
+ 'c_layer': c_layer,
1173
+ 'layer_probs': layer_probs
1174
+ })
1175
+
1176
+ short_prompt = prompt[:35] + "..." if len(prompt) > 35 else prompt
1177
+ print(f" - '{short_prompt}' (base: '{base_pred}')")
1178
+ print(f" -> Inject '{target_clean}' (α={final_alpha:.1f}) | Crystallization boundary: \033[96mLayer {c_layer}\033[0m")
1179
+ print()
1180
+
1181
+ fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6), gridspec_kw={'width_ratios': [2, 1]})
1182
+
1183
+ layers_x = np.arange(0, n_layers)
1184
+ colors = {"Shallow (Short Context)": "#2ecc71", "Deep (Long Context / Clauses)": "#9b59b6", "Code (Structured Logic)": "#e67e22"}
1185
+
1186
+ c_layers_shallow = []
1187
+ c_layers_deep = []
1188
+ c_layers_code = []
1189
+
1190
+ for group_name, res_list in results.items():
1191
+ color = colors[group_name]
1192
+ for i, res in enumerate(res_list):
1193
+ if "Shallow" in group_name:
1194
+ c_layers_shallow.append(res['c_layer'])
1195
+ elif "Deep" in group_name:
1196
+ c_layers_deep.append(res['c_layer'])
1197
+ elif "Code" in group_name:
1198
+ c_layers_code.append(res['c_layer'])
1199
+
1200
+ label = group_name if i == 0 else "_nolegend_"
1201
+ ax1.plot(layers_x, res['layer_probs'], color=color, alpha=0.6, lw=2.5, label=label)
1202
+
1203
+ c_idx = res['c_layer']
1204
+ if c_idx < n_layers:
1205
+ ax1.scatter(c_idx, res['layer_probs'][c_idx], color=color, s=120, marker='X', edgecolors='black', zorder=5)
1206
+
1207
+ ax1.set_title("Target Concept Viability vs. Injection Delay", fontsize=12, fontweight='bold')
1208
+ ax1.set_xlabel("Intervention Start Layer (Later start = Context already crystallized)")
1209
+ ax1.set_ylabel("Final Probability of Injected Concept")
1210
+ ax1.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1.0, decimals=0))
1211
+ ax1.legend(fontsize=10)
1212
+ ax1.grid(True, alpha=0.3)
1213
+
1214
+ box_data = []
1215
+ box_labels = []
1216
+ box_colors_list = []
1217
+ if c_layers_shallow:
1218
+ box_data.append(c_layers_shallow)
1219
+ box_labels.append("Shallow\n(Short)")
1220
+ box_colors_list.append(colors["Shallow (Short Context)"])
1221
+ if c_layers_deep:
1222
+ box_data.append(c_layers_deep)
1223
+ box_labels.append("Deep\n(Long)")
1224
+ box_colors_list.append(colors["Deep (Long Context / Clauses)"])
1225
+ if c_layers_code:
1226
+ box_data.append(c_layers_code)
1227
+ box_labels.append("Code\n(Structured)")
1228
+ box_colors_list.append(colors["Code (Structured Logic)"])
1229
+
1230
+ if len(box_data) >= 2:
1231
+ bplot = ax2.boxplot(box_data, patch_artist=True, widths=0.5)
1232
+ ax2.set_xticks(range(1, len(box_data) + 1))
1233
+ ax2.set_xticklabels(box_labels)
1234
+
1235
+ for patch, c in zip(bplot['boxes'], box_colors_list):
1236
+ patch.set_facecolor(c)
1237
+ patch.set_alpha(0.6)
1238
+
1239
+ for idx, (data, c) in enumerate(zip(box_data, box_colors_list)):
1240
+ ax2.scatter(np.random.normal(idx + 1, 0.05, len(data)), data, color=c, alpha=0.9, s=50)
1241
+
1242
+ ax2.set_title("Crystallization Boundary Distribution", fontsize=12, fontweight='bold')
1243
+ ax2.set_ylabel("Crystallization Layer (Point of No Return)")
1244
+ ax2.set_ylim(-1, n_layers + 2)
1245
+ ax2.yaxis.set_major_locator(ticker.MaxNLocator(integer=True))
1246
+ ax2.grid(True, axis='y', alpha=0.3)
1247
+
1248
+ plt.suptitle("reFlow Causal Audit: Context Type Affects Information Crystallization", fontsize=15, fontweight='bold')
1249
+ plt.tight_layout(rect=[0, 0, 1, 0.95])
1250
+
1251
+ save_path = os.path.join(report_dir, "task_crystallization_shift.png")
1252
+ plt.savefig(save_path, bbox_inches='tight', dpi=200)
1253
+ plt.close()
1254
+
1255
+ print(" ================= Conclusions =================")
1256
+ if c_layers_shallow:
1257
+ avg_shallow = np.mean(c_layers_shallow)
1258
+ print(f" > Shallow (short context) avg boundary: Layer {avg_shallow:.1f}")
1259
+ if c_layers_deep:
1260
+ avg_deep = np.mean(c_layers_deep)
1261
+ print(f" > Deep (long context) avg boundary: Layer {avg_deep:.1f}")
1262
+ if c_layers_code:
1263
+ avg_code = np.mean(c_layers_code)
1264
+ print(f" > Code (structured logic) avg boundary: Layer {avg_code:.1f}")
1265
+ if c_layers_shallow and c_layers_deep:
1266
+ print(f" > Shallow→Deep boundary shift: \033[93m{np.mean(c_layers_deep) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
1267
+ if c_layers_shallow and c_layers_code:
1268
+ print(f" > Shallow→Code boundary shift: \033[93m{np.mean(c_layers_code) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
1269
+ print(f" > Results show: Context complexity affects crystallization boundary.")
1270
+ print(f" More complex contexts tend to maintain representation fluidity at deeper layers.")
1271
+ print(f" > Chart saved: {save_path}")
1272
+
1273
+
1274
  def main_menu():
1275
  model, enc, device, report_dir = load_setup_and_model()
1276
 
 
1287
  '10': ("Emotion Surgery", exp_10_emotion_surgery),
1288
  '11': ("Concept Inception", exp_11_concept_inception),
1289
  '12': ("Genetic Hijack", exp_12_genetic_hijack),
1290
+ '13': ("Task Crystallization Shift", exp_13_task_crystallization_shift),
1291
  }
1292
 
1293
  while True: