JulianHJR
/

v2

Model card Files Files and versions

v2 / README.md

JulianHJR's picture

Upload folder using huggingface_hub

e53f10b verified 17 days ago

|

history blame contribute delete

2.07 kB

	# Student Simulation v2

	Quick reference. See `runall.sh` for the full pipeline.

	## 关键变更（v2 vs v1）

	1. 公式语义: `x_new = x - (1 - α) · P · h`，α=1 不变，α=0 完全压制
	2. Sweep 范围: α ∈ [0, 1]（v1 范围越界导致崩溃伪迹）
	3. 方向版本: 只保留 `v1_raw` 和新版 `v_pca_subspace`（k=3 子空间）
	4. 新功能: `JointResidualSteerer` 防止跨维度代偿
	5. 新指标: `count_real_monitoring()` 区分真反思和填充词
	6. 新指标: `is_collapsed()` 用 4-gram 重复 + 长度比，比 v1 更稳
	7. 08b: attention 输出诊断（informational only）
	8. 10_infer: 加入 runall 作为 sanity check
	9. 删除: LLM rater (11_llm_quality_rating.py)

	## 用法

	```bash
	# 单卡完整跑
	bash runall.sh

	# 启用 anti-leak joint steering
	JOINT=1 bash runall.sh

	# 只跑某些 stage
	STAGES=8,8b,9,10 bash runall.sh

	# 只对一个题做 inference
	python scripts/10_infer.py --dim planning --alphas 1.0 0.5 0.0 \
	--problem "Find x such that x^2=49"
	```

	## 目录

	```
	data/
	models/ # Qwen3-30B-A3B-Thinking-2507
	cots/ # raw + labeled CoTs
	routing/ # router top-k dumps
	activations/ # decision-point residuals
	checkpoints/
	planning_v1_raw.pt
	planning_v_pca_subspace.pt # 新版 k=3 子空间
	monitoring_v1_raw.pt
	monitoring_v_pca_subspace.pt
	results/
	sweep_log.jsonl # 含 steered_text
	final_report.md
	attention_diagnostic.{json,png} # 新
	infer_sanity_planning.json # 新
	infer_sanity_monitoring.json # 新
	logs/
	```

	## 关键 config (`configs/model.py`)

	```python
	ALPHA_SWEEP = [0.0, 0.1, 0.2, 0.3, 0.5, 0.75, 1.0]
	DIRECTION_VERSIONS = ["v1_raw", "v_pca_subspace"]
	PCA_SUBSPACE_K = 3
	ANTI_LEAK_BETA = 0.3
	GEN_CONFIG["max_new_tokens"] = 12000 # 之前 4096 太小
	GEN_CONFIG_FAST["max_new_tokens"] = 8192 # 之前 1024 太小
	```