Upload 4 files
Browse files- README.md +10 -0
- README_CN.md +10 -0
- experiment.py +239 -1
- experiment_en.py +229 -0
README.md
CHANGED
|
@@ -18,10 +18,18 @@ pipeline_tag: text-generation
|
|
| 18 |
|
| 19 |
# reFlow
|
| 20 |
|
|
|
|
|
|
|
| 21 |
**A Metal Soul In My Hand** — A feature-decoupled Transformer architecture with native interpretability.
|
| 22 |
|
| 23 |
reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Key Results
|
| 26 |
|
| 27 |
**Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points — Small (46.47M), reFlow-1 (463.67M), Big (515.06M) — confirm strict scaling law compliance (val loss: 3.55 → 3.01 → 2.92).
|
|
@@ -34,6 +42,8 @@ reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **
|
|
| 34 |
- Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 → 0/3, silhouette +0.11 → −0.02)
|
| 35 |
|
| 36 |
> **Paper**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Project Structure
|
| 39 |
|
|
|
|
| 18 |
|
| 19 |
# reFlow
|
| 20 |
|
| 21 |
+
[ [中文](README_CN.md) | English ]
|
| 22 |
+
|
| 23 |
**A Metal Soul In My Hand** — A feature-decoupled Transformer architecture with native interpretability.
|
| 24 |
|
| 25 |
reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
|
| 26 |
|
| 27 |
+
## Online Demo
|
| 28 |
+
|
| 29 |
+
**Try reFlow in your browser:**
|
| 30 |
+
- [HuggingFace Space](https://huggingface.co/spaces/reuAC/reFlow) (Global Access)
|
| 31 |
+
- [ModelScope Studio](https://www.modelscope.cn/studios/recuAC/reFlow) (China Access)
|
| 32 |
+
|
| 33 |
## Key Results
|
| 34 |
|
| 35 |
**Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points — Small (46.47M), reFlow-1 (463.67M), Big (515.06M) — confirm strict scaling law compliance (val loss: 3.55 → 3.01 → 2.92).
|
|
|
|
| 42 |
- Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 → 0/3, silhouette +0.11 → −0.02)
|
| 43 |
|
| 44 |
> **Paper**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
|
| 45 |
+
>
|
| 46 |
+
> **Pretrained Weights**: [HuggingFace](https://huggingface.co/reuAC/reFlow)
|
| 47 |
|
| 48 |
## Project Structure
|
| 49 |
|
README_CN.md
CHANGED
|
@@ -1,9 +1,17 @@
|
|
| 1 |
# reFlow
|
| 2 |
|
|
|
|
|
|
|
| 3 |
**A Metal Soul In My Hand** — 具备原生可解释性的特征解耦 Transformer 架构。
|
| 4 |
|
| 5 |
reFlow 将嵌入矩阵 $E \in \mathbb{R}^{V \times d}$ 分解为**配方矩阵** $W_{recipe} \in \mathbb{R}^{V \times S}$ 与**信号基底矩阵** $W_{basis} \in \mathbb{R}^{S \times d}$ 的乘积形式,迫使模型在潜空间中维护一组连续、低冗余的信号基底。同一乘积 $W_{recipe} \times W_{basis}$ 同时用于输入嵌入与输出投影,构成端到端的信号流形计算闭环,无需独立 LM Head。
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
## 核心结果
|
| 8 |
|
| 9 |
**收敛性。** 在对齐深度与参数量(36 层,~515M)的条件下,reFlow-1-Big 的验证损失与 GPT-2-New(514M)差距仅约 1%。三个参数规模点 — Small(46.47M)、reFlow-1(463.67M)、Big(515.06M)— 验证损失分别为 3.55、3.01、2.92,严格遵循缩放定律。
|
|
@@ -16,6 +24,8 @@ reFlow 将嵌入矩阵 $E \in \mathbb{R}^{V \times d}$ 分解为**配方矩阵**
|
|
| 16 |
- 硬稀疏约束(Top-64)系统性摧毁配方空间语义结构(代数 3/3 → 0/3,轮廓系数 +0.11 → −0.02)
|
| 17 |
|
| 18 |
> **论文**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — 理论推导、12 项可解释性实验及缩放/消融分析。
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## 项目结构
|
| 21 |
|
|
|
|
| 1 |
# reFlow
|
| 2 |
|
| 3 |
+
[ 中文 | [English](README.md) ]
|
| 4 |
+
|
| 5 |
**A Metal Soul In My Hand** — 具备原生可解释性的特征解耦 Transformer 架构。
|
| 6 |
|
| 7 |
reFlow 将嵌入矩阵 $E \in \mathbb{R}^{V \times d}$ 分解为**配方矩阵** $W_{recipe} \in \mathbb{R}^{V \times S}$ 与**信号基底矩阵** $W_{basis} \in \mathbb{R}^{S \times d}$ 的乘积形式,迫使模型在潜空间中维护一组连续、低冗余的信号基底。同一乘积 $W_{recipe} \times W_{basis}$ 同时用于输入嵌入与输出投影,构成端到端的信号流形计算闭环,无需独立 LM Head。
|
| 8 |
|
| 9 |
+
## 在线演示
|
| 10 |
+
|
| 11 |
+
**在浏览器中体验 reFlow:**
|
| 12 |
+
- [HuggingFace Space](https://huggingface.co/spaces/reuAC/reFlow)(全球访问)
|
| 13 |
+
- [ModelScope Studio](https://www.modelscope.cn/studios/recuAC/reFlow)(中国境内)
|
| 14 |
+
|
| 15 |
## 核心结果
|
| 16 |
|
| 17 |
**收敛性。** 在对齐深度与参数量(36 层,~515M)的条件下,reFlow-1-Big 的验证损失与 GPT-2-New(514M)差距仅约 1%。三个参数规模点 — Small(46.47M)、reFlow-1(463.67M)、Big(515.06M)— 验证损失分别为 3.55、3.01、2.92,严格遵循缩放定律。
|
|
|
|
| 24 |
- 硬稀疏约束(Top-64)系统性摧毁配方空间语义结构(代数 3/3 → 0/3,轮廓系数 +0.11 → −0.02)
|
| 25 |
|
| 26 |
> **论文**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — 理论推导、12 项可解释性实验及缩放/消融分析。
|
| 27 |
+
>
|
| 28 |
+
> **预训练权重**: [HuggingFace](https://huggingface.co/reuAC/reFlow)
|
| 29 |
|
| 30 |
## 项目结构
|
| 31 |
|
experiment.py
CHANGED
|
@@ -301,6 +301,38 @@ def exp_2_sparsity_profile(model, enc, device, report_dir):
|
|
| 301 |
plt.close()
|
| 302 |
print(f" > 图表已保存: {save_path}")
|
| 303 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
|
| 305 |
def exp_3_basis_geometry(model, enc, device, report_dir):
|
| 306 |
print("\n" + "="*60)
|
|
@@ -824,7 +856,7 @@ def exp_10_emotion_surgery(model, enc, device, report_dir):
|
|
| 824 |
neg_vec = torch.stack([W_v2s[enc.encode(" " + w)[0]] for w in neg_words]).mean(dim=0)
|
| 825 |
steer_vec = pos_vec - neg_vec
|
| 826 |
|
| 827 |
-
text = "The food was absolutely terrible and the service was"
|
| 828 |
n_layers = len(model.transformer.h)
|
| 829 |
|
| 830 |
scan_layers = list(range(0, n_layers, max(1, n_layers // 6)))
|
|
@@ -1042,6 +1074,211 @@ def exp_12_genetic_hijack(model, enc, device, report_dir):
|
|
| 1042 |
|
| 1043 |
print(f"\n > 实验完成。对照组与干预组的文本对比即为结果。")
|
| 1044 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1045 |
|
| 1046 |
def main_menu():
|
| 1047 |
model, enc, device, report_dir = load_setup_and_model()
|
|
@@ -1059,6 +1296,7 @@ def main_menu():
|
|
| 1059 |
'10': ("情绪手术 (Emotion Surgery)", exp_10_emotion_surgery),
|
| 1060 |
'11': ("概念注入 (Concept Inception)", exp_11_concept_inception),
|
| 1061 |
'12': ("基因库篡改 (Genetic Hijack)", exp_12_genetic_hijack),
|
|
|
|
| 1062 |
}
|
| 1063 |
|
| 1064 |
while True:
|
|
|
|
| 301 |
plt.close()
|
| 302 |
print(f" > 图表已保存: {save_path}")
|
| 303 |
|
| 304 |
+
# === 输出论文绘图所需的数据 ===
|
| 305 |
+
print("\n" + "="*60)
|
| 306 |
+
print(" [论文数据导出] 用于 TikZ/PGFPlots 绘图")
|
| 307 |
+
print("="*60)
|
| 308 |
+
|
| 309 |
+
if is_topk:
|
| 310 |
+
active_per_word_np = active_per_word.cpu().numpy()
|
| 311 |
+
else:
|
| 312 |
+
active_per_word_np = active_per_word
|
| 313 |
+
|
| 314 |
+
# --- 图1: 每词活跃信号数直方图数据 ---
|
| 315 |
+
hist_min = int(active_per_word_np.min())
|
| 316 |
+
hist_max = int(active_per_word_np.max())
|
| 317 |
+
hist_bins = np.arange(hist_min, hist_max + 2)
|
| 318 |
+
hist_counts, hist_edges = np.histogram(active_per_word_np, bins=hist_bins)
|
| 319 |
+
print(f"\n [直方图] 每词活跃信号数分布 (bin_start, count):")
|
| 320 |
+
print(f" mean={np.mean(active_per_word_np):.1f}, min={hist_min}, max={hist_max}")
|
| 321 |
+
print(" ---BEGIN_HISTOGRAM_DATA---")
|
| 322 |
+
for i in range(len(hist_counts)):
|
| 323 |
+
if hist_counts[i] > 0:
|
| 324 |
+
print(f" {int(hist_edges[i])} {hist_counts[i]}")
|
| 325 |
+
print(" ---END_HISTOGRAM_DATA---")
|
| 326 |
+
|
| 327 |
+
# --- 图2: 信号利用率数据(按利用率排序) ---
|
| 328 |
+
sorted_utilization = np.sort(active_per_signal)[::-1]
|
| 329 |
+
print(f"\n [柱状图] 信号利用率 (按降序排列, signal_rank, n_words):")
|
| 330 |
+
print(f" mean={np.mean(active_per_signal):.0f}, min={np.min(active_per_signal)}, max={np.max(active_per_signal)}")
|
| 331 |
+
print(" ---BEGIN_UTILIZATION_DATA---")
|
| 332 |
+
for i, val in enumerate(sorted_utilization):
|
| 333 |
+
print(f" {i} {val}")
|
| 334 |
+
print(" ---END_UTILIZATION_DATA---")
|
| 335 |
+
|
| 336 |
|
| 337 |
def exp_3_basis_geometry(model, enc, device, report_dir):
|
| 338 |
print("\n" + "="*60)
|
|
|
|
| 856 |
neg_vec = torch.stack([W_v2s[enc.encode(" " + w)[0]] for w in neg_words]).mean(dim=0)
|
| 857 |
steer_vec = pos_vec - neg_vec
|
| 858 |
|
| 859 |
+
text = "The food was absolutely terrible and the service was "
|
| 860 |
n_layers = len(model.transformer.h)
|
| 861 |
|
| 862 |
scan_layers = list(range(0, n_layers, max(1, n_layers // 6)))
|
|
|
|
| 1074 |
|
| 1075 |
print(f"\n > 实验完成。对照组与干预组的文本对比即为结果。")
|
| 1076 |
|
| 1077 |
+
def exp_13_task_crystallization_shift(model, enc, device, report_dir):
|
| 1078 |
+
print("\n" + "="*60)
|
| 1079 |
+
print(" [实验 13] 任务类型与结晶边界偏移 (Context-Dependent Crystallization)")
|
| 1080 |
+
print("="*60)
|
| 1081 |
+
|
| 1082 |
+
W_basis = model.transformer.wte.signal_basis.data
|
| 1083 |
+
W_v2s = _get_vocab_signals(model)
|
| 1084 |
+
n_layers = len(model.transformer.h)
|
| 1085 |
+
|
| 1086 |
+
# 严谨的控制变量:短上下文(迅速结晶) vs 长上下文定语(延迟结晶)
|
| 1087 |
+
# 试图将常识强行扭转到一个荒谬的概念上,测量模型在什么层级彻底拒绝扭转
|
| 1088 |
+
task_groups = {
|
| 1089 |
+
"Shallow (Short Context)": [
|
| 1090 |
+
("The capital of France is", "London"),
|
| 1091 |
+
("The cat sat on the", "moon"),
|
| 1092 |
+
("The sky is", "red"),
|
| 1093 |
+
("Open the door with a", "car")
|
| 1094 |
+
],
|
| 1095 |
+
"Deep (Long Context / Clauses)": [
|
| 1096 |
+
("When the geography teacher asked the students, they answered that the capital of France is", "London"),
|
| 1097 |
+
("After carefully reviewing all the evidence presented in court, the judge decided that the defendant was", "guilty"),
|
| 1098 |
+
("When you look outside the window at the beautiful nature, the color of the clear sky is", "red"),
|
| 1099 |
+
("I was locked out of my house yesterday, and to open the locked door, you need a", "car")
|
| 1100 |
+
],
|
| 1101 |
+
"Code (Structured Logic)": [
|
| 1102 |
+
("def add(a, b): return a +", "None"),
|
| 1103 |
+
("x = 1 + 2\ny =", "None"),
|
| 1104 |
+
("for i in range(10):\n print(", "None"),
|
| 1105 |
+
("if x > 0:\n result =", "None")
|
| 1106 |
+
]
|
| 1107 |
+
}
|
| 1108 |
+
|
| 1109 |
+
def continuous_steer(prompt, target_tid, base_tid, alpha, intercept_layer):
|
| 1110 |
+
# 提取方向向量:目标概念 - 原生概念
|
| 1111 |
+
steer_vec = W_v2s[target_tid] - W_v2s[base_tid]
|
| 1112 |
+
ids = torch.tensor(enc.encode(prompt), device=device).unsqueeze(0)
|
| 1113 |
+
|
| 1114 |
+
with torch.no_grad():
|
| 1115 |
+
x = _embed(model, ids)
|
| 1116 |
+
# 如果从第 0 层就开始干预
|
| 1117 |
+
if intercept_layer == 0:
|
| 1118 |
+
x[:, -1, :] += (alpha * steer_vec) @ W_basis
|
| 1119 |
+
|
| 1120 |
+
freqs_cis = model.freqs_cis[:ids.size(1)]
|
| 1121 |
+
for i, block in enumerate(model.transformer.h):
|
| 1122 |
+
x = block(x, freqs_cis)
|
| 1123 |
+
# 关键修复:从 intercept_layer 开始,随后每一层都持续施加概念挟持
|
| 1124 |
+
if intercept_layer is not None and i + 1 >= intercept_layer:
|
| 1125 |
+
x[:, -1, :] += (alpha * steer_vec) @ W_basis
|
| 1126 |
+
|
| 1127 |
+
x_norm = model.transformer.ln_f(x[0, -1, :])
|
| 1128 |
+
logits = _get_logits_from_hidden(model, x_norm)
|
| 1129 |
+
probs = F.softmax(logits, dim=-1)
|
| 1130 |
+
pred_id = torch.argmax(logits).item()
|
| 1131 |
+
return probs[target_tid].item(), enc.decode([pred_id]).strip(), pred_id
|
| 1132 |
+
|
| 1133 |
+
results = {"Shallow (Short Context)": [], "Deep (Long Context / Clauses)": [], "Code (Structured Logic)": []}
|
| 1134 |
+
|
| 1135 |
+
print(" 开始执行层级连续干预扫描 (Continuous Intervention Sweep)...\n")
|
| 1136 |
+
|
| 1137 |
+
for group_name, tasks in task_groups.items():
|
| 1138 |
+
print(f" [{group_name}]")
|
| 1139 |
+
for prompt, target in tasks:
|
| 1140 |
+
target_clean = target.strip()
|
| 1141 |
+
target_tid = enc.encode(" " + target)[0]
|
| 1142 |
+
|
| 1143 |
+
# 1. 获取自然基线预测
|
| 1144 |
+
_, base_pred, base_tid = continuous_steer(prompt, target_tid, target_tid, 0.0, None)
|
| 1145 |
+
if base_pred == target_clean:
|
| 1146 |
+
print(f" [Skip] '{prompt[:20]}...' 自然预测已是 '{target_clean}'。")
|
| 1147 |
+
continue
|
| 1148 |
+
|
| 1149 |
+
# 2. 寻找浅层 (Layer 0) 能够成功扭转的温和临界 Alpha
|
| 1150 |
+
working_alpha = None
|
| 1151 |
+
for a in np.arange(2.0, 50.0, 2.0):
|
| 1152 |
+
_, pred, _ = continuous_steer(prompt, target_tid, base_tid, a, 0)
|
| 1153 |
+
if pred == target_clean:
|
| 1154 |
+
working_alpha = a
|
| 1155 |
+
break
|
| 1156 |
+
|
| 1157 |
+
if working_alpha is None:
|
| 1158 |
+
print(f" [Skip] '{prompt[:20]}...': Alpha在50内无法干预,跳过。")
|
| 1159 |
+
continue
|
| 1160 |
+
|
| 1161 |
+
# 增加 20% 裕量,保证挟持稳定性
|
| 1162 |
+
final_alpha = working_alpha * 1.2
|
| 1163 |
+
|
| 1164 |
+
# 3. 逐层推迟注入时间点,寻找结晶边界
|
| 1165 |
+
layer_probs = []
|
| 1166 |
+
c_layer = n_layers
|
| 1167 |
+
|
| 1168 |
+
for L in range(n_layers):
|
| 1169 |
+
p_target, pred, _ = continuous_steer(prompt, target_tid, base_tid, final_alpha, L)
|
| 1170 |
+
layer_probs.append(p_target)
|
| 1171 |
+
|
| 1172 |
+
# 如果从第 L 层开始持续按着方向盘,模型依然跑偏,说明第 L 层时语义已彻底结晶
|
| 1173 |
+
if pred != target_clean and c_layer == n_layers:
|
| 1174 |
+
c_layer = L
|
| 1175 |
+
|
| 1176 |
+
results[group_name].append({
|
| 1177 |
+
'prompt': prompt,
|
| 1178 |
+
'target': target_clean,
|
| 1179 |
+
'alpha': final_alpha,
|
| 1180 |
+
'base_pred': base_pred,
|
| 1181 |
+
'c_layer': c_layer,
|
| 1182 |
+
'layer_probs': layer_probs
|
| 1183 |
+
})
|
| 1184 |
+
|
| 1185 |
+
short_prompt = prompt[:35] + "..." if len(prompt) > 35 else prompt
|
| 1186 |
+
print(f" - '{short_prompt}' (原预测: '{base_pred}')")
|
| 1187 |
+
print(f" -> 持续注入 '{target_clean}' (α={final_alpha:.1f}) | 结晶失效边界: \033[96mLayer {c_layer}\033[0m")
|
| 1188 |
+
print()
|
| 1189 |
+
|
| 1190 |
+
# ================= 绘制图表 =================
|
| 1191 |
+
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6), gridspec_kw={'width_ratios': [2, 1]})
|
| 1192 |
+
|
| 1193 |
+
layers_x = np.arange(0, n_layers)
|
| 1194 |
+
colors = {"Shallow (Short Context)": "#2ecc71", "Deep (Long Context / Clauses)": "#9b59b6", "Code (Structured Logic)": "#e67e22"}
|
| 1195 |
+
|
| 1196 |
+
c_layers_shallow = []
|
| 1197 |
+
c_layers_deep = []
|
| 1198 |
+
c_layers_code = []
|
| 1199 |
+
|
| 1200 |
+
for group_name, res_list in results.items():
|
| 1201 |
+
color = colors[group_name]
|
| 1202 |
+
for i, res in enumerate(res_list):
|
| 1203 |
+
if "Shallow" in group_name:
|
| 1204 |
+
c_layers_shallow.append(res['c_layer'])
|
| 1205 |
+
elif "Deep" in group_name:
|
| 1206 |
+
c_layers_deep.append(res['c_layer'])
|
| 1207 |
+
elif "Code" in group_name:
|
| 1208 |
+
c_layers_code.append(res['c_layer'])
|
| 1209 |
+
|
| 1210 |
+
label = group_name if i == 0 else "_nolegend_"
|
| 1211 |
+
ax1.plot(layers_x, res['layer_probs'], color=color, alpha=0.6, lw=2.5, label=label)
|
| 1212 |
+
|
| 1213 |
+
c_idx = res['c_layer']
|
| 1214 |
+
if c_idx < n_layers:
|
| 1215 |
+
ax1.scatter(c_idx, res['layer_probs'][c_idx], color=color, s=120, marker='X', edgecolors='black', zorder=5)
|
| 1216 |
+
|
| 1217 |
+
ax1.set_title("Target Concept Viability vs. Injection Delay", fontsize=12, fontweight='bold')
|
| 1218 |
+
ax1.set_xlabel("Intervention Start Layer (Later start = Context already crystallized)")
|
| 1219 |
+
ax1.set_ylabel("Final Probability of Injected Concept")
|
| 1220 |
+
ax1.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1.0, decimals=0))
|
| 1221 |
+
ax1.legend(fontsize=10)
|
| 1222 |
+
ax1.grid(True, alpha=0.3)
|
| 1223 |
+
|
| 1224 |
+
box_data = []
|
| 1225 |
+
box_labels = []
|
| 1226 |
+
box_colors_list = []
|
| 1227 |
+
if c_layers_shallow:
|
| 1228 |
+
box_data.append(c_layers_shallow)
|
| 1229 |
+
box_labels.append("Shallow\n(Short)")
|
| 1230 |
+
box_colors_list.append(colors["Shallow (Short Context)"])
|
| 1231 |
+
if c_layers_deep:
|
| 1232 |
+
box_data.append(c_layers_deep)
|
| 1233 |
+
box_labels.append("Deep\n(Long)")
|
| 1234 |
+
box_colors_list.append(colors["Deep (Long Context / Clauses)"])
|
| 1235 |
+
if c_layers_code:
|
| 1236 |
+
box_data.append(c_layers_code)
|
| 1237 |
+
box_labels.append("Code\n(Structured)")
|
| 1238 |
+
box_colors_list.append(colors["Code (Structured Logic)"])
|
| 1239 |
+
|
| 1240 |
+
if len(box_data) >= 2:
|
| 1241 |
+
bplot = ax2.boxplot(box_data, patch_artist=True, widths=0.5)
|
| 1242 |
+
ax2.set_xticks(range(1, len(box_data) + 1))
|
| 1243 |
+
ax2.set_xticklabels(box_labels)
|
| 1244 |
+
|
| 1245 |
+
for patch, c in zip(bplot['boxes'], box_colors_list):
|
| 1246 |
+
patch.set_facecolor(c)
|
| 1247 |
+
patch.set_alpha(0.6)
|
| 1248 |
+
|
| 1249 |
+
for idx, (data, c) in enumerate(zip(box_data, box_colors_list)):
|
| 1250 |
+
ax2.scatter(np.random.normal(idx + 1, 0.05, len(data)), data, color=c, alpha=0.9, s=50)
|
| 1251 |
+
|
| 1252 |
+
ax2.set_title("Crystallization Boundary Distribution", fontsize=12, fontweight='bold')
|
| 1253 |
+
ax2.set_ylabel("Crystallization Layer (Point of No Return)")
|
| 1254 |
+
ax2.set_ylim(-1, n_layers + 2)
|
| 1255 |
+
ax2.yaxis.set_major_locator(ticker.MaxNLocator(integer=True))
|
| 1256 |
+
ax2.grid(True, axis='y', alpha=0.3)
|
| 1257 |
+
|
| 1258 |
+
plt.suptitle("reFlow Causal Audit: Context Type Affects Information Crystallization", fontsize=15, fontweight='bold')
|
| 1259 |
+
plt.tight_layout(rect=[0, 0, 1, 0.95])
|
| 1260 |
+
|
| 1261 |
+
save_path = os.path.join(report_dir, "task_crystallization_shift.png")
|
| 1262 |
+
plt.savefig(save_path, bbox_inches='tight', dpi=200)
|
| 1263 |
+
plt.close()
|
| 1264 |
+
|
| 1265 |
+
print(" ================= 实验结论 =================")
|
| 1266 |
+
if c_layers_shallow:
|
| 1267 |
+
avg_shallow = np.mean(c_layers_shallow)
|
| 1268 |
+
print(f" > 短上下文 (浅层任务) 平均结晶边界: Layer {avg_shallow:.1f}")
|
| 1269 |
+
if c_layers_deep:
|
| 1270 |
+
avg_deep = np.mean(c_layers_deep)
|
| 1271 |
+
print(f" > 长上下文 (深层任务) 平均结晶边界: Layer {avg_deep:.1f}")
|
| 1272 |
+
if c_layers_code:
|
| 1273 |
+
avg_code = np.mean(c_layers_code)
|
| 1274 |
+
print(f" > 代码 (结构化逻辑) 平均结晶边界: Layer {avg_code:.1f}")
|
| 1275 |
+
if c_layers_shallow and c_layers_deep:
|
| 1276 |
+
print(f" > 短→长 边界延迟量: \033[93m{np.mean(c_layers_deep) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
|
| 1277 |
+
if c_layers_shallow and c_layers_code:
|
| 1278 |
+
print(f" > 短→代码 边界延迟量: \033[93m{np.mean(c_layers_code) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
|
| 1279 |
+
print(f" > 实验表明:不同任务类型的上下文复杂度影响模型内部表征的结晶边界,")
|
| 1280 |
+
print(f" 更复杂的上下文倾向于在更深层级保持内部表征的流动性。")
|
| 1281 |
+
print(f" > 图表已保存: {save_path}")
|
| 1282 |
|
| 1283 |
def main_menu():
|
| 1284 |
model, enc, device, report_dir = load_setup_and_model()
|
|
|
|
| 1296 |
'10': ("情绪手术 (Emotion Surgery)", exp_10_emotion_surgery),
|
| 1297 |
'11': ("概念注入 (Concept Inception)", exp_11_concept_inception),
|
| 1298 |
'12': ("基因库篡改 (Genetic Hijack)", exp_12_genetic_hijack),
|
| 1299 |
+
'13': ("任务结晶边界偏移 (Task Shift)", exp_13_task_crystallization_shift),
|
| 1300 |
}
|
| 1301 |
|
| 1302 |
while True:
|
experiment_en.py
CHANGED
|
@@ -301,6 +301,38 @@ def exp_2_sparsity_profile(model, enc, device, report_dir):
|
|
| 301 |
plt.close()
|
| 302 |
print(f" > Chart saved: {save_path}")
|
| 303 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
|
| 305 |
def exp_3_basis_geometry(model, enc, device, report_dir):
|
| 306 |
print("\n" + "="*60)
|
|
@@ -1043,6 +1075,202 @@ def exp_12_genetic_hijack(model, enc, device, report_dir):
|
|
| 1043 |
print(f"\n > Experiment complete. Compare the control and hijacked texts above.")
|
| 1044 |
|
| 1045 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1046 |
def main_menu():
|
| 1047 |
model, enc, device, report_dir = load_setup_and_model()
|
| 1048 |
|
|
@@ -1059,6 +1287,7 @@ def main_menu():
|
|
| 1059 |
'10': ("Emotion Surgery", exp_10_emotion_surgery),
|
| 1060 |
'11': ("Concept Inception", exp_11_concept_inception),
|
| 1061 |
'12': ("Genetic Hijack", exp_12_genetic_hijack),
|
|
|
|
| 1062 |
}
|
| 1063 |
|
| 1064 |
while True:
|
|
|
|
| 301 |
plt.close()
|
| 302 |
print(f" > Chart saved: {save_path}")
|
| 303 |
|
| 304 |
+
# === Export data for paper plotting ===
|
| 305 |
+
print("\n" + "="*60)
|
| 306 |
+
print(" [Paper Data Export] For TikZ/PGFPlots")
|
| 307 |
+
print("="*60)
|
| 308 |
+
|
| 309 |
+
if is_topk:
|
| 310 |
+
active_per_word_np = active_per_word.cpu().numpy()
|
| 311 |
+
else:
|
| 312 |
+
active_per_word_np = active_per_word
|
| 313 |
+
|
| 314 |
+
# --- Figure 1: Histogram data for active signals per word ---
|
| 315 |
+
hist_min = int(active_per_word_np.min())
|
| 316 |
+
hist_max = int(active_per_word_np.max())
|
| 317 |
+
hist_bins = np.arange(hist_min, hist_max + 2)
|
| 318 |
+
hist_counts, hist_edges = np.histogram(active_per_word_np, bins=hist_bins)
|
| 319 |
+
print(f"\n [Histogram] Active signals per word distribution (bin_start, count):")
|
| 320 |
+
print(f" mean={np.mean(active_per_word_np):.1f}, min={hist_min}, max={hist_max}")
|
| 321 |
+
print(" ---BEGIN_HISTOGRAM_DATA---")
|
| 322 |
+
for i in range(len(hist_counts)):
|
| 323 |
+
if hist_counts[i] > 0:
|
| 324 |
+
print(f" {int(hist_edges[i])} {hist_counts[i]}")
|
| 325 |
+
print(" ---END_HISTOGRAM_DATA---")
|
| 326 |
+
|
| 327 |
+
# --- Figure 2: Signal utilization data (sorted by utilization) ---
|
| 328 |
+
sorted_utilization = np.sort(active_per_signal)[::-1]
|
| 329 |
+
print(f"\n [Bar chart] Signal utilization (descending order, signal_rank, n_words):")
|
| 330 |
+
print(f" mean={np.mean(active_per_signal):.0f}, min={np.min(active_per_signal)}, max={np.max(active_per_signal)}")
|
| 331 |
+
print(" ---BEGIN_UTILIZATION_DATA---")
|
| 332 |
+
for i, val in enumerate(sorted_utilization):
|
| 333 |
+
print(f" {i} {val}")
|
| 334 |
+
print(" ---END_UTILIZATION_DATA---")
|
| 335 |
+
|
| 336 |
|
| 337 |
def exp_3_basis_geometry(model, enc, device, report_dir):
|
| 338 |
print("\n" + "="*60)
|
|
|
|
| 1075 |
print(f"\n > Experiment complete. Compare the control and hijacked texts above.")
|
| 1076 |
|
| 1077 |
|
| 1078 |
+
def exp_13_task_crystallization_shift(model, enc, device, report_dir):
|
| 1079 |
+
print("\n" + "="*60)
|
| 1080 |
+
print(" [Exp 13] Task-Dependent Crystallization Boundary")
|
| 1081 |
+
print("="*60)
|
| 1082 |
+
|
| 1083 |
+
W_basis = model.transformer.wte.signal_basis.data
|
| 1084 |
+
W_v2s = _get_vocab_signals(model)
|
| 1085 |
+
n_layers = len(model.transformer.h)
|
| 1086 |
+
|
| 1087 |
+
task_groups = {
|
| 1088 |
+
"Shallow (Short Context)": [
|
| 1089 |
+
("The capital of France is", "London"),
|
| 1090 |
+
("The cat sat on the", "moon"),
|
| 1091 |
+
("The sky is", "red"),
|
| 1092 |
+
("Open the door with a", "car")
|
| 1093 |
+
],
|
| 1094 |
+
"Deep (Long Context / Clauses)": [
|
| 1095 |
+
("When the geography teacher asked the students, they answered that the capital of France is", "London"),
|
| 1096 |
+
("After carefully reviewing all the evidence presented in court, the judge decided that the defendant was", "guilty"),
|
| 1097 |
+
("When you look outside the window at the beautiful nature, the color of the clear sky is", "red"),
|
| 1098 |
+
("I was locked out of my house yesterday, and to open the locked door, you need a", "car")
|
| 1099 |
+
],
|
| 1100 |
+
"Code (Structured Logic)": [
|
| 1101 |
+
("def add(a, b): return a +", "None"),
|
| 1102 |
+
("x = 1 + 2\ny =", "None"),
|
| 1103 |
+
("for i in range(10):\n print(", "None"),
|
| 1104 |
+
("if x > 0:\n result =", "None")
|
| 1105 |
+
]
|
| 1106 |
+
}
|
| 1107 |
+
|
| 1108 |
+
def continuous_steer(prompt, target_tid, base_tid, alpha, intercept_layer):
|
| 1109 |
+
steer_vec = W_v2s[target_tid] - W_v2s[base_tid]
|
| 1110 |
+
ids = torch.tensor(enc.encode(prompt), device=device).unsqueeze(0)
|
| 1111 |
+
|
| 1112 |
+
with torch.no_grad():
|
| 1113 |
+
x = _embed(model, ids)
|
| 1114 |
+
if intercept_layer == 0:
|
| 1115 |
+
x[:, -1, :] += (alpha * steer_vec) @ W_basis
|
| 1116 |
+
|
| 1117 |
+
freqs_cis = model.freqs_cis[:ids.size(1)]
|
| 1118 |
+
for i, block in enumerate(model.transformer.h):
|
| 1119 |
+
x = block(x, freqs_cis)
|
| 1120 |
+
if intercept_layer is not None and i + 1 >= intercept_layer:
|
| 1121 |
+
x[:, -1, :] += (alpha * steer_vec) @ W_basis
|
| 1122 |
+
|
| 1123 |
+
x_norm = model.transformer.ln_f(x[0, -1, :])
|
| 1124 |
+
logits = _get_logits_from_hidden(model, x_norm)
|
| 1125 |
+
probs = F.softmax(logits, dim=-1)
|
| 1126 |
+
pred_id = torch.argmax(logits).item()
|
| 1127 |
+
return probs[target_tid].item(), enc.decode([pred_id]).strip(), pred_id
|
| 1128 |
+
|
| 1129 |
+
results = {"Shallow (Short Context)": [], "Deep (Long Context / Clauses)": [], "Code (Structured Logic)": []}
|
| 1130 |
+
|
| 1131 |
+
print(" Starting continuous intervention sweep...\n")
|
| 1132 |
+
|
| 1133 |
+
for group_name, tasks in task_groups.items():
|
| 1134 |
+
print(f" [{group_name}]")
|
| 1135 |
+
for prompt, target in tasks:
|
| 1136 |
+
target_clean = target.strip()
|
| 1137 |
+
target_tid = enc.encode(" " + target)[0]
|
| 1138 |
+
|
| 1139 |
+
_, base_pred, base_tid = continuous_steer(prompt, target_tid, target_tid, 0.0, None)
|
| 1140 |
+
if base_pred == target_clean:
|
| 1141 |
+
print(f" [Skip] '{prompt[:20]}...' already predicts '{target_clean}'.")
|
| 1142 |
+
continue
|
| 1143 |
+
|
| 1144 |
+
working_alpha = None
|
| 1145 |
+
for a in np.arange(2.0, 50.0, 2.0):
|
| 1146 |
+
_, pred, _ = continuous_steer(prompt, target_tid, base_tid, a, 0)
|
| 1147 |
+
if pred == target_clean:
|
| 1148 |
+
working_alpha = a
|
| 1149 |
+
break
|
| 1150 |
+
|
| 1151 |
+
if working_alpha is None:
|
| 1152 |
+
print(f" [Skip] '{prompt[:20]}...': Cannot steer within alpha<50.")
|
| 1153 |
+
continue
|
| 1154 |
+
|
| 1155 |
+
final_alpha = working_alpha * 1.2
|
| 1156 |
+
|
| 1157 |
+
layer_probs = []
|
| 1158 |
+
c_layer = n_layers
|
| 1159 |
+
|
| 1160 |
+
for L in range(n_layers):
|
| 1161 |
+
p_target, pred, _ = continuous_steer(prompt, target_tid, base_tid, final_alpha, L)
|
| 1162 |
+
layer_probs.append(p_target)
|
| 1163 |
+
|
| 1164 |
+
if pred != target_clean and c_layer == n_layers:
|
| 1165 |
+
c_layer = L
|
| 1166 |
+
|
| 1167 |
+
results[group_name].append({
|
| 1168 |
+
'prompt': prompt,
|
| 1169 |
+
'target': target_clean,
|
| 1170 |
+
'alpha': final_alpha,
|
| 1171 |
+
'base_pred': base_pred,
|
| 1172 |
+
'c_layer': c_layer,
|
| 1173 |
+
'layer_probs': layer_probs
|
| 1174 |
+
})
|
| 1175 |
+
|
| 1176 |
+
short_prompt = prompt[:35] + "..." if len(prompt) > 35 else prompt
|
| 1177 |
+
print(f" - '{short_prompt}' (base: '{base_pred}')")
|
| 1178 |
+
print(f" -> Inject '{target_clean}' (α={final_alpha:.1f}) | Crystallization boundary: \033[96mLayer {c_layer}\033[0m")
|
| 1179 |
+
print()
|
| 1180 |
+
|
| 1181 |
+
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6), gridspec_kw={'width_ratios': [2, 1]})
|
| 1182 |
+
|
| 1183 |
+
layers_x = np.arange(0, n_layers)
|
| 1184 |
+
colors = {"Shallow (Short Context)": "#2ecc71", "Deep (Long Context / Clauses)": "#9b59b6", "Code (Structured Logic)": "#e67e22"}
|
| 1185 |
+
|
| 1186 |
+
c_layers_shallow = []
|
| 1187 |
+
c_layers_deep = []
|
| 1188 |
+
c_layers_code = []
|
| 1189 |
+
|
| 1190 |
+
for group_name, res_list in results.items():
|
| 1191 |
+
color = colors[group_name]
|
| 1192 |
+
for i, res in enumerate(res_list):
|
| 1193 |
+
if "Shallow" in group_name:
|
| 1194 |
+
c_layers_shallow.append(res['c_layer'])
|
| 1195 |
+
elif "Deep" in group_name:
|
| 1196 |
+
c_layers_deep.append(res['c_layer'])
|
| 1197 |
+
elif "Code" in group_name:
|
| 1198 |
+
c_layers_code.append(res['c_layer'])
|
| 1199 |
+
|
| 1200 |
+
label = group_name if i == 0 else "_nolegend_"
|
| 1201 |
+
ax1.plot(layers_x, res['layer_probs'], color=color, alpha=0.6, lw=2.5, label=label)
|
| 1202 |
+
|
| 1203 |
+
c_idx = res['c_layer']
|
| 1204 |
+
if c_idx < n_layers:
|
| 1205 |
+
ax1.scatter(c_idx, res['layer_probs'][c_idx], color=color, s=120, marker='X', edgecolors='black', zorder=5)
|
| 1206 |
+
|
| 1207 |
+
ax1.set_title("Target Concept Viability vs. Injection Delay", fontsize=12, fontweight='bold')
|
| 1208 |
+
ax1.set_xlabel("Intervention Start Layer (Later start = Context already crystallized)")
|
| 1209 |
+
ax1.set_ylabel("Final Probability of Injected Concept")
|
| 1210 |
+
ax1.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1.0, decimals=0))
|
| 1211 |
+
ax1.legend(fontsize=10)
|
| 1212 |
+
ax1.grid(True, alpha=0.3)
|
| 1213 |
+
|
| 1214 |
+
box_data = []
|
| 1215 |
+
box_labels = []
|
| 1216 |
+
box_colors_list = []
|
| 1217 |
+
if c_layers_shallow:
|
| 1218 |
+
box_data.append(c_layers_shallow)
|
| 1219 |
+
box_labels.append("Shallow\n(Short)")
|
| 1220 |
+
box_colors_list.append(colors["Shallow (Short Context)"])
|
| 1221 |
+
if c_layers_deep:
|
| 1222 |
+
box_data.append(c_layers_deep)
|
| 1223 |
+
box_labels.append("Deep\n(Long)")
|
| 1224 |
+
box_colors_list.append(colors["Deep (Long Context / Clauses)"])
|
| 1225 |
+
if c_layers_code:
|
| 1226 |
+
box_data.append(c_layers_code)
|
| 1227 |
+
box_labels.append("Code\n(Structured)")
|
| 1228 |
+
box_colors_list.append(colors["Code (Structured Logic)"])
|
| 1229 |
+
|
| 1230 |
+
if len(box_data) >= 2:
|
| 1231 |
+
bplot = ax2.boxplot(box_data, patch_artist=True, widths=0.5)
|
| 1232 |
+
ax2.set_xticks(range(1, len(box_data) + 1))
|
| 1233 |
+
ax2.set_xticklabels(box_labels)
|
| 1234 |
+
|
| 1235 |
+
for patch, c in zip(bplot['boxes'], box_colors_list):
|
| 1236 |
+
patch.set_facecolor(c)
|
| 1237 |
+
patch.set_alpha(0.6)
|
| 1238 |
+
|
| 1239 |
+
for idx, (data, c) in enumerate(zip(box_data, box_colors_list)):
|
| 1240 |
+
ax2.scatter(np.random.normal(idx + 1, 0.05, len(data)), data, color=c, alpha=0.9, s=50)
|
| 1241 |
+
|
| 1242 |
+
ax2.set_title("Crystallization Boundary Distribution", fontsize=12, fontweight='bold')
|
| 1243 |
+
ax2.set_ylabel("Crystallization Layer (Point of No Return)")
|
| 1244 |
+
ax2.set_ylim(-1, n_layers + 2)
|
| 1245 |
+
ax2.yaxis.set_major_locator(ticker.MaxNLocator(integer=True))
|
| 1246 |
+
ax2.grid(True, axis='y', alpha=0.3)
|
| 1247 |
+
|
| 1248 |
+
plt.suptitle("reFlow Causal Audit: Context Type Affects Information Crystallization", fontsize=15, fontweight='bold')
|
| 1249 |
+
plt.tight_layout(rect=[0, 0, 1, 0.95])
|
| 1250 |
+
|
| 1251 |
+
save_path = os.path.join(report_dir, "task_crystallization_shift.png")
|
| 1252 |
+
plt.savefig(save_path, bbox_inches='tight', dpi=200)
|
| 1253 |
+
plt.close()
|
| 1254 |
+
|
| 1255 |
+
print(" ================= Conclusions =================")
|
| 1256 |
+
if c_layers_shallow:
|
| 1257 |
+
avg_shallow = np.mean(c_layers_shallow)
|
| 1258 |
+
print(f" > Shallow (short context) avg boundary: Layer {avg_shallow:.1f}")
|
| 1259 |
+
if c_layers_deep:
|
| 1260 |
+
avg_deep = np.mean(c_layers_deep)
|
| 1261 |
+
print(f" > Deep (long context) avg boundary: Layer {avg_deep:.1f}")
|
| 1262 |
+
if c_layers_code:
|
| 1263 |
+
avg_code = np.mean(c_layers_code)
|
| 1264 |
+
print(f" > Code (structured logic) avg boundary: Layer {avg_code:.1f}")
|
| 1265 |
+
if c_layers_shallow and c_layers_deep:
|
| 1266 |
+
print(f" > Shallow→Deep boundary shift: \033[93m{np.mean(c_layers_deep) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
|
| 1267 |
+
if c_layers_shallow and c_layers_code:
|
| 1268 |
+
print(f" > Shallow→Code boundary shift: \033[93m{np.mean(c_layers_code) - np.mean(c_layers_shallow):+.1f} Layers\033[0m")
|
| 1269 |
+
print(f" > Results show: Context complexity affects crystallization boundary.")
|
| 1270 |
+
print(f" More complex contexts tend to maintain representation fluidity at deeper layers.")
|
| 1271 |
+
print(f" > Chart saved: {save_path}")
|
| 1272 |
+
|
| 1273 |
+
|
| 1274 |
def main_menu():
|
| 1275 |
model, enc, device, report_dir = load_setup_and_model()
|
| 1276 |
|
|
|
|
| 1287 |
'10': ("Emotion Surgery", exp_10_emotion_surgery),
|
| 1288 |
'11': ("Concept Inception", exp_11_concept_inception),
|
| 1289 |
'12': ("Genetic Hijack", exp_12_genetic_hijack),
|
| 1290 |
+
'13': ("Task Crystallization Shift", exp_13_task_crystallization_shift),
|
| 1291 |
}
|
| 1292 |
|
| 1293 |
while True:
|