Student Simulation v2

Quick reference. See runall.sh for the full pipeline.

关键变更（v2 vs v1）

公式语义: x_new = x - (1 - α) · P · h，α=1 不变，α=0 完全压制
Sweep 范围: α ∈ [0, 1]（v1 范围越界导致崩溃伪迹）
方向版本: 只保留 v1_raw 和新版 v_pca_subspace（k=3 子空间）
新功能: JointResidualSteerer 防止跨维度代偿
新指标: count_real_monitoring() 区分真反思和填充词
新指标: is_collapsed() 用 4-gram 重复 + 长度比，比 v1 更稳
08b: attention 输出诊断（informational only）
10_infer: 加入 runall 作为 sanity check
删除: LLM rater (11_llm_quality_rating.py)

用法

# 单卡完整跑
bash runall.sh

# 启用 anti-leak joint steering
JOINT=1 bash runall.sh

# 只跑某些 stage
STAGES=8,8b,9,10 bash runall.sh

# 只对一个题做 inference
python scripts/10_infer.py --dim planning --alphas 1.0 0.5 0.0 \
    --problem "Find x such that x^2=49"

data/
  models/                                # Qwen3-30B-A3B-Thinking-2507
  cots/                                  # raw + labeled CoTs
  routing/                               # router top-k dumps
  activations/                           # decision-point residuals
  checkpoints/
    planning_v1_raw.pt
    planning_v_pca_subspace.pt           # 新版 k=3 子空间
    monitoring_v1_raw.pt
    monitoring_v_pca_subspace.pt
  results/
    sweep_log.jsonl                      # 含 steered_text
    final_report.md
    attention_diagnostic.{json,png}      # 新
    infer_sanity_planning.json           # 新
    infer_sanity_monitoring.json         # 新
  logs/

关键 config (`configs/model.py`)

ALPHA_SWEEP        = [0.0, 0.1, 0.2, 0.3, 0.5, 0.75, 1.0]
DIRECTION_VERSIONS = ["v1_raw", "v_pca_subspace"]
PCA_SUBSPACE_K     = 3
ANTI_LEAK_BETA     = 0.3
GEN_CONFIG["max_new_tokens"]      = 12000  # 之前 4096 太小
GEN_CONFIG_FAST["max_new_tokens"] =  8192  # 之前 1024 太小

JulianHJR
/

v2

Student Simulation v2

关键变更（v2 vs v1）

用法

目录

关键 config (`configs/model.py`)

Student Simulation v2

关键变更（v2 vs v1）

用法

目录

关键 config (configs/model.py)

关键 config (`configs/model.py`)