Add files using upload-large-folder tool

Browse files

Files changed (7) hide show

.huggingfaceignore +4 -0
Residual_Prompt_Bridge.md +501 -0
build_rpb_dev_manifest.py +71 -0
dev_subsets_rpb_v1.json +620 -0
load_model.py +4 -1
train.py +268 -15
upload_hf.py +47 -94

.huggingfaceignore CHANGED Viewed

@@ -1,4 +1,8 @@
 __pycache__/
 **/__pycache__/
 *.pyc
 .git/

 __pycache__/
 **/__pycache__/
 *.pyc
+*.pyo
 .git/
+**/.pytest_cache/
+**/.cache/
+upload.log

Residual_Prompt_Bridge.md ADDED Viewed

	@@ -0,0 +1,501 @@

+# Residual Prompt Bridge 论文导向实验路线图
+## 1. 当前主 claim
+论文主 claim 现在正式锁定为：
+> **We propose an image-conditioned directional prompt correction module that orthogonalizes prompt updates to steer language-side prompts toward a more decodable SAM prompt manifold, mitigating cross-distribution prompt interface mismatch.**
+对应中文表述：
+> **我们提出一种图像条件的方向型 prompt correction，通过正交化更新把语言侧 prompt 朝更可解码的 SAM prompt manifold 偏转，从而缓解跨分布的 prompt 接口失配。**
+从现在开始，所有实验都只服务这句 claim，不再让方法故事扩散成“大而全系统”。
+---
+## 2. 当前项目定位
+当前 RPB 项目已经完成了最关键的早期筛查：
+1. **实现正确性通过**
+   - checkpoint / LoRA 兼容问题已修复
+   - bridge 路径不会自动破坏 baseline
+   - identity-preserving sanity check 已通过
+2. **几何机制方向明确**
+   - additive residual 不足以推动 `p_hat` 离开 `q`
+   - directional bridge 明显优于 additive
+   - orthogonalization 能把 residual 预算从径向缩放转成方向修正
+3. **当前最小核心已浮现**
+   - `image-conditioned`
+   - `p_mask-only`
+   - `directional`
+   - `orthogonal`
+   - `single-token correction`
+4. **mixed 的角色目前仍未定型**
+   - weak mixed 不会抹掉 bridge
+   - 但目前更像 enhancer / compatibility probe，而不是稳定的 decoder-facing calibration mechanism
+因此，当前最重要的不是继续加模块，而是把这个**最小有效核心**做成稳定、可复现、可投稿的方法骨架。
+---
+## 3. 两套判据：Mechanism Pass vs Paper Pass
+### 3.1 Mechanism pass
+回答的问题是：
+> 这个方法设计是否真的抓住了问题本质？
+当前 mechanism pass 需要被下面这些证据支撑：
+- additive vs directional：directional 明显更能让 `p_hat` 离开 identity
+- without orthogonal vs with orthogonal：orthogonalization 明显改善 `Δp` 的几何利用效率
+- `Δp` 稳定朝 `p_mask`
+- `p_hat` 能明显离开 `q`
+- seen/unseen 的 alignment ratio 健康
+- weak mixed 不会直接把 bridge 拉回 baseline
+### 3.2 Paper pass
+回答的问题是：
+> 这个方法是否已经强到能单独撑起一篇顶会方法论文？
+paper pass 需要下面这些更强条件：
+- 更大规模评估上有稳定、同向的 headline 趋势
+- 至少在 unseen 上有清晰、可复现的优势
+- seen / null 的代价可接受
+- 2 个随机种子下趋势稳定
+- 最小闭环 ablation 完整
+当前状态：
+- **mechanism pass：接近通过，但还缺更大规模验证和关键 baseline**
+- **paper pass：尚未通过**
+后续每组实验都要明确写清楚：它是在推进 mechanism pass，还是在推进 paper pass。
+---
+## 4. 冻结最小核心方法
+在 pure RPB standalone 路线中，当前只保留下列组成：
+- `image-conditioned correction`
+- `p_mask-only teacher`
+- `directional bridge`
+- `orthogonalized update`
+- `single-token prompt correction`
+当前明确**不进入主线**的内容：
+- `z_gt` 作为主 teacher
+- calibrator
+- refinement
+- 多 token bridge
+- 大而全的完整 bridge 系统
+这些内容后续最多作为 ablation、扩展或 hybrid 组件，而不是当前主方法本体。
+---
+## 5. 当前实验事实总结
+### 5.1 已确认的正结果
+- bridge 可以安全接入，不会自动毁掉 baseline
+- 修复 checkpoint / LoRA 后，RPB 路径与 baseline 基本等价
+- `directional + orthogonal` 后：
+  - `Δp` 高度对齐 `p_mask`
+  - `Δp` 不再主要沿 `q` 的平行方向浪费预算
+  - `p_hat` 能够明显离开 identity 区
+- `p_mask-only teacher-only` 已在 quick eval 上给出：
+  - seen 小幅回落但可控
+  - unseen 轻微正信号
+  - null 基本持平
+### 5.2 已确认的负结果
+- additive residual 不足以真正旋转 prompt
+- `L_mask` 不是早期主矛盾
+- `z_gt` 目前不是 sparse bridge 的主 teacher
+- weak mixed 目前不能稳定把 seen 拉回 baseline
+### 5.3 当前最重要的工作假设
+> `p_mask-only + image-conditioned + directional + orthogonal` 已经抓住主问题，但还需要找到更稳定的 operating point，并证明其 headline 趋势不是噪声。
+### 5.4 Fixed dev 阶段 A 当前记录
+固定 dev 子集：
+- `test_s`: 200 samples
+- `test_u`: 200 samples
+- `test_n`: 200 samples
+- manifest: `/workspace/SimToken/dev_subsets_rpb_v1.json`
+#### Fixed dev baseline
+| Setting | Seen mIoU | Seen F | Unseen mIoU | Unseen F | Null |
+|---|---:|---:|---:|---:|---:|
+| baseline | 0.72554 | 0.81811 | 0.68531 | 0.77238 | 0.01452 |
+#### Teacher-only alpha search
+| Setting | Seen mIoU | Seen F | Unseen mIoU | Unseen F | Null | Seen cos(p_hat,p_mask) | Unseen cos(p_hat,p_mask) | 机制判断 |
+|---|---:|---:|---:|---:|---:|---:|---:|---|
+| image, alpha=0.20 | 0.72517 | 0.81376 | 0.68596 | 0.77730 | 0.01426 | 0.09502 | 0.06611 | 机制最强，Seen/F 有代价 |
+| image, alpha=0.18 | 0.72692 | 0.81705 | 0.68595 | 0.77354 | 0.01448 | 0.02873 | 0.00605 | 性能平衡较好，机制偏弱 |
+| image, alpha=0.15 | 0.72669 | 0.81725 | 0.68569 | 0.77330 | 0.01448 | 0.02373 | 0.00282 | 更接近 identity |
+| image, alpha=0.12 | 0.72651 | 0.81748 | 0.68578 | 0.77314 | 0.01449 | 0.01871 | -0.00046 | 轻扰动区，机制最弱 |
+阶段 A 的 teacher-only 结论：
+- `alpha=0.20` 是机制候选点，能明显改变 prompt geometry。
+- `alpha=0.18` 是性能平衡候选点，seen / unseen / null 都更稳。
+- `alpha=0.12/0.15` 已经过于接近 identity，不适合作为机制主证据。
+#### Weak mixed 局部验证
+| Setting | Seen mIoU | Seen F | Unseen mIoU | Unseen F | Null | Seen cos(p_hat,p_mask) | Unseen cos(p_hat,p_mask) | 角色判断 |
+|---|---:|---:|---:|---:|---:|---:|---:|---|
+| image, alpha=0.18, weak mixed | 0.72704 | 0.81554 | 0.68706 | 0.77454 | 0.01451 | 0.04079 | 0.01325 | 当前最佳性能平衡候选 |
+| image, alpha=0.15, weak mixed | 0.72684 | 0.81607 | 0.68674 | 0.77419 | 0.01451 | 0.03382 | 0.00882 | 稳定但略弱于 alpha=0.18 mixed |
+weak mixed 当前结论：
+- weak mixed 没有把 bridge 拉回 identity。
+- weak mixed 对 `alpha=0.15/0.18` 都更像 mild enhancement，而不是 destructive pullback。
+- `alpha=0.18 + weak mixed` 是当前 fixed dev 的最佳 operating point。
+#### q-only directional baseline
+| Setting | Seen mIoU | Seen F | Unseen mIoU | Unseen F | Null | Seen cos(p_hat,p_mask) | Unseen cos(p_hat,p_mask) | 判断 |
+|---|---:|---:|---:|---:|---:|---:|---:|---|
+| q-only, alpha=0.18 | 0.72311 | 0.81206 | 0.68289 | 0.77666 | 0.01424 | 0.12061 | 0.09598 | alignment 更强但 mIoU 更差 |
+q-only 结论：
+- directional / orthogonal 机制本身很强，q-only 也能大幅拉高 teacher alignment。
+- q-only 的 prompt steering 更激进，`gate_mean` 更高，`delta_norm` 更大。
+- q-only mIoU 在 seen / unseen 上都低于 image-conditioned candidate。
+- 当前证据支持：image conditioning 的价值不是单纯提高 teacher cosine，而是约束方向修正，使 prompt steering 与 decoder compatibility 之间的平衡更好。
+#### 阶段 A 当前候选
+当前 fixed dev 最佳候选：
+> **image-conditioned + p_mask-only + directional + orthogonal + alpha=0.18 + weak mixed**
+对应 checkpoint：
+> `/workspace/SimToken/checkpoints/rpb_dev_mixed_pm_only_a018_wm005.pth`
+---
+## 6. 实验纪律：停止在 test 上自由调方向
+从下一阶段开始，必须冻结一套 **dev tuning subset**，不再继续在 `test_s/test_u/test_n` 上自由调 alpha 和 mixed 设定。
+建议立即固定：
+- `dev_seen`
+- `dev_unseen`
+- `dev_null`
+每个 split 可先取 `100` 或 `200` 个样本，后续：
+- alpha 选择
+- mixed 选择
+- warm-start 配置
+- early stopping
+全部只在 dev 上完成。
+真正的 test split 只用于后续一次性确认和最终表格。
+---
+## 7. 三阶段推进路线
+## 阶段 A：锁最小核心的 operating point
+### 目标
+回答：
+> 当前最小核心是否能在更大 quick eval 上形成稳定、可接受的性能-几何平衡？
+### 本阶段只做两类实验
+#### A1. teacher-only operating point 搜索
+固定：
+- image-conditioned
+- `p_mask-only`
+- directional
+- orthogonal
+- single-token
+- 不加 `z_gt`
+- 不加 calibrator
+- 不加 refinement
+重点只扫：
+- `alpha = 0.12, 0.15, 0.18, 0.20`
+当前判断是：`0.20` 已经是 promising pass，因此没有必要继续向更大 alpha 发散。
+#### A2. weak mixed 局部验证
+只围绕最佳 teacher-only checkpoint 做 warm-start，不做大 sweep。
+建议只测：
+- `best_alpha`
+- `best_alpha - 0.03`
+以及很弱的 mask 强度两档：
+- `λ_mask = 0.05`
+- `λ_mask = 0.10`
+mixed 的目标不是涨分，而是判断它的角色到底是：
+- calibration
+- enhancement
+- 还是 destructive pullback
+### 阶段 A 重点指标
+几何指标：
+- `cos(p_hat, p_mask)_seen`
+- `cos(p_hat, p_mask)_unseen`
+- `cos(p_hat, q)`
+- `cos(Δp, p_mask)`
+- `cos(Δp, q)`
+- `align_ratio = cos_u / cos_s`
+性能指标：
+- `mIoU_seen`
+- `mIoU_unseen`
+- `Fscore_seen`
+- `Fscore_unseen`
+- `Null metric`
+### 阶段 A 的通过标准
+若在 dev 或更大 quick eval 上，能找到一个稳定点满足：
+- unseen 稳定不差于 baseline，最好有小幅提升
+- seen 代价可控
+- null 基本持平或代价可接受
+- `cos(p_hat, p_mask)` 明显离开 identity 区
+- seen/unseen 的 alignment ratio 健康
+则阶段 A 通过。
+### 阶段 A 的停止条件
+若完成：
+1. alpha 局部搜索
+2. weak mixed 局部搜索
+3. 100 / 200 样本 quick eval
+之后仍出现任一情况，则停止 pure RPB standalone 主线：
+- 在更大 quick eval 上没有稳定、同向的 unseen 优势
+- seen/unseen tradeoff 对 alpha 高度敏感
+- null 代价无法压到 baseline 附近
+- mixed 始终只是增强器，而不是 decoder-facing calibration
+---
+## 阶段 B：做最小闭环 ablation
+只有阶段 A 通过后，才进入阶段 B。
+### 目标
+把方法主骨架讲圆，形成 mechanism pass 的闭环证据。
+### 必做的 4 个关键 ablation
+1. **additive vs directional**
+2. **directional without orthogonalization vs with orthogonalization**
+3. **q-only directional vs image-conditioned directional**
+4. **`p_mask-only` vs `p_mask + weak z_gt`**
+这 4 个已经足够支撑方法论证，不再继续扩更多 trick ablation。
+### 阶段 B 的补充要求
+- 至少 2 个随机种子重复
+- 至少一次更大规模验证
+- 建立 geometry-performance coupling：
+  - prompt geometry 改写程度
+  - 与 seen/unseen 表现之间的关系
+  - 与 identity 回缩之间的关系
+### 阶段 B 的停止条件
+若完成：
+1. alpha 局部搜索
+2. weak mixed 局部搜索
+3. 100 / 200 样本 quick eval
+4. 至少一次更大规模验证
+5. 2 个随机种子重复
+后仍满足以下任一条，则停止 pure RPB standalone：
+- 大子集 / full-split 上没有稳定、同向的 unseen 优势
+- 最优点高度依赖 seed 或 alpha，趋势不稳定
+- null 代价无法控制
+- mixed 无法形成稳定 calibration 作用
+- headline result 仍然只有极弱波动
+---
+## 阶段 C：决定论文定位
+### 路线 1：pure RPB standalone
+如果满足：
+- 更大评估上有稳定 unseen gain
+- seen / null 代价可接受
+- 2 seeds 稳定
+- 最小闭环 ablation 完整
+则走：
+> **pure RPB 方法论文**
+### 路线 2：RPB + TTO hybrid
+如果出现：
+- mechanism 成立
+- 但 paper pass 不够硬
+- headline result 仍然偏弱或不稳定
+则立刻切换定位：
+> **RPB + TTO hybrid 方法论文**
+此时 RPB 的角色不再是 standalone 主方法，而是：
+- amortized prompt corrector
+- 改善 test-time refinement 起点质量的前端模块
+---
+## 8. Hybrid 路线作为明确 Plan B
+若 pure RPB 最终只能做到：
+- unseen 稳定小涨
+- seen 小掉
+- null 持平或略好
+那么 standalone 顶会会比较吃力。
+但此时 RPB 作为前端 prompt corrector 仍很有价值：
+- 改善初始 `q` 的几何
+- 为 q-LTPO / selective refinement 提供更好的初始化
+- 降低 test-time optimization 的步数和不稳定性
+hybrid 的论文叙事可以明确写成：
+1. train-time：amortized interface correction
+2. test-time：instance-specific prompt refinement
+3. 两者结合：同时解决全局接口失配与样本级细化问题
+当前判断：hybrid 是非常强的 Plan B，而不是临时补救路线。
+---
+## 9. 负结果如何写进论文论证链条
+当前已经得到了一条清晰的“设计收敛链条”，后续可以直接转写为论文方法论证：
+### 为什么不是 additive residual
+因为 additive 下：
+- `Δp` 主要对抗 `q` 的平行分量
+- teacher 方向被大范数 `q` 吞掉
+- 结果更像缩放，而不是旋转
+### 为什么要 directional
+因为 directional 才能把修正显式变成 prompt 方向控制，而不是数值扰动。
+### 为什么要 orthogonal
+因为 orthogonalization 才能避免 residual 预算浪费在径向缩放上。
+### 为什么当前只保留 `p_mask`
+因为当前 sparse bridge 里，`p_mask` 一直是主 teacher，`z_gt` 尚未成为主信号。
+### 为什么 mixed 不是主模块
+因为 mixed 目前更像 compatibility / enhancement probe，而不是稳定的 calibration mechanism。
+这条链条必须在文中明确写出，让 reviewer 看到方法是沿诊断逐步收敛的，而不是盲目堆模块。
+---
+## 10. 当前最直接的执行建议
+接下来不要发散，严格按下面顺序走：
+1. **立刻冻结论文主 claim**
+2. **立刻切换到固定 dev 子集，不再自由用 test 调方向**
+3. **完成阶段 A：最小核心 operating point 搜索**
+4. **补关键 baseline：q-only directional**
+5. **做两种 seed**
+6. **然后做 pure RPB standalone 的去留决策**
+当前最重要的执行原则是：
+> **先证明最小核心能稳定成立；如果 headline 不够硬，就及时把它升级成 hybrid 前端，而不是继续把 pure RPB 做复杂。**
+---
+## 11. 当前阶段的明确结论
+### 当前方向值得继续吗？
+**值得。**
+### 现在最应该做什么？
+不是继续扩模块，而是：
+- 找到 teacher-only `p_mask-only directional orthogonal` 的最佳 operating point
+- 用 very weak mixed 判断 mixed 是否能形成 calibration
+- 在 dev 和更大 quick eval 上证明趋势不是噪声
+### 什么时候该停 pure RPB？
+只要阶段 A + B 完成后，headline 仍然弱且不稳定，就停止 pure RPB standalone。
+### 停了之后怎么办？
+直接转：
+> **RPB + TTO hybrid**
+这条路线当前是明确的 Plan B，而且很可能是更强的顶会方法论文路径。

build_rpb_dev_manifest.py ADDED Viewed

	@@ -0,0 +1,71 @@

+import argparse
+import json
+import os
+import random
+import pandas as pd
+def sample_indices(size, count, seed):
+    if count <= 0:
+        return []
+    if count > size:
+        raise ValueError(f"Requested {count} samples from a split of size {size}")
+    rng = random.Random(seed)
+    indices = list(range(size))
+    rng.shuffle(indices)
+    selected = sorted(indices[:count])
+    return selected
+def main():
+    parser = argparse.ArgumentParser(description="Build a fixed subset manifest for RPB dev experiments.")
+    parser.add_argument("--metadata", type=str, default="/workspace/SimToken/data/metadata.csv")
+    parser.add_argument("--output", type=str, required=True)
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--train_rows", type=int, default=0)
+    parser.add_argument("--test_s_rows", type=int, default=200)
+    parser.add_argument("--test_u_rows", type=int, default=200)
+    parser.add_argument("--test_n_rows", type=int, default=200)
+    args = parser.parse_args()
+    metadata = pd.read_csv(args.metadata, header=0)
+    split_sizes = {
+        "train": int((metadata["split"] == "train").sum()),
+        "test_s": int((metadata["split"] == "test_s").sum()),
+        "test_u": int((metadata["split"] == "test_u").sum()),
+        "test_n": int((metadata["split"] == "test_n").sum()),
+    }
+    manifest = {
+        "train": sample_indices(split_sizes["train"], args.train_rows, args.seed),
+        "test_s": sample_indices(split_sizes["test_s"], args.test_s_rows, args.seed + 1),
+        "test_u": sample_indices(split_sizes["test_u"], args.test_u_rows, args.seed + 2),
+        "test_n": sample_indices(split_sizes["test_n"], args.test_n_rows, args.seed + 3),
+    }
+    # Remove empty entries so train.py only subsets the splits we intentionally fix.
+    manifest = {key: value for key, value in manifest.items() if value}
+    os.makedirs(os.path.dirname(os.path.abspath(args.output)), exist_ok=True)
+    with open(args.output, "w", encoding="utf-8") as f:
+        json.dump(
+            {
+                "metadata": {
+                    "seed": args.seed,
+                    "split_sizes": split_sizes,
+                    "source_metadata": os.path.abspath(args.metadata),
+                },
+                "subsets": manifest,
+            },
+            f,
+            indent=2,
+        )
+    print(f"saved subset manifest to {args.output}")
+    for split_name, indices in manifest.items():
+        print(f"{split_name}: {len(indices)} samples")
+if __name__ == "__main__":
+    main()

dev_subsets_rpb_v1.json ADDED Viewed

	@@ -0,0 +1,620 @@

+{
+  "metadata": {
+    "seed": 42,
+    "split_sizes": {
+      "train": 14113,
+      "test_s": 2288,
+      "test_u": 1656,
+      "test_n": 1028
+    },
+    "source_metadata": "/workspace/SimToken/data/metadata.csv"
+  },
+  "subsets": {
+    "test_s": [
+      6,
+      16,
+      36,
+      71,
+      74,
+      88,
+      108,
+      114,
+      116,
+      122,
+      126,
+      128,
+      134,
+      138,
+      139,
+      146,
+      152,
+      159,
+      177,
+      196,
+      217,
+      219,
+      249,
+      256,
+      268,
+      276,
+      279,
+      286,
+      287,
+      297,
+      298,
+      299,
+      312,
+      313,
+      324,
+      331,
+      332,
+      347,
+      378,
+      383,
+      402,
+      410,
+      412,
+      420,
+      451,
+      452,
+      458,
+      467,
+      477,
+      484,
+      486,
+      497,
+      499,
+      512,
+      526,
+      533,
+      543,
+      550,
+      551,
+      567,
+      574,
+      576,
+      581,
+      594,
+      596,
+      608,
+      616,
+      625,
+      627,
+      642,
+      646,
+      663,
+      692,
+      700,
+      704,
+      724,
+      745,
+      754,
+      795,
+      815,
+      819,
+      831,
+      843,
+      854,
+      867,
+      895,
+      946,
+      953,
+      965,
+      975,
+      979,
+      989,
+      1004,
+      1007,
+      1008,
+      1010,
+      1023,
+      1039,
+      1051,
+      1052,
+      1072,
+      1075,
+      1080,
+      1088,
+      1099,
+      1101,
+      1104,
+      1106,
+      1134,
+      1138,
+      1169,
+      1180,
+      1201,
+      1205,
+      1221,
+      1230,
+      1247,
+      1258,
+      1272,
+      1279,
+      1284,
+      1294,
+      1297,
+      1312,
+      1329,
+      1339,
+      1343,
+      1367,
+      1379,
+      1406,
+      1417,
+      1461,
+      1462,
+      1468,
+      1473,
+      1474,
+      1489,
+      1493,
+      1500,
+      1510,
+      1517,
+      1552,
+      1556,
+      1557,
+      1589,
+      1609,
+      1612,
+      1618,
+      1622,
+      1624,
+      1644,
+      1647,
+      1665,
+      1669,
+      1676,
+      1682,
+      1683,
+      1691,
+      1700,
+      1726,
+      1746,
+      1748,
+      1758,
+      1764,
+      1765,
+      1778,
+      1785,
+      1786,
+      1808,
+      1826,
+      1852,
+      1861,
+      1883,
+      1891,
+      1916,
+      1938,
+      1944,
+      1967,
+      1971,
+      1980,
+      1986,
+      2034,
+      2044,
+      2067,
+      2074,
+      2082,
+      2085,
+      2118,
+      2128,
+      2156,
+      2176,
+      2182,
+      2185,
+      2188,
+      2194,
+      2206,
+      2211,
+      2215,
+      2247,
+      2256
+    ],
+    "test_u": [
+      4,
+      16,
+      26,
+      38,
+      40,
+      48,
+      50,
+      65,
+      83,
+      92,
+      102,
+      117,
+      120,
+      135,
+      144,
+      153,
+      155,
+      185,
+      200,
+      201,
+      211,
+      219,
+      221,
+      226,
+      227,
+      240,
+      245,
+      251,
+      252,
+      255,
+      267,
+      272,
+      274,
+      276,
+      278,
+      282,
+      284,
+      286,
+      303,
+      309,
+      313,
+      328,
+      345,
+      348,
+      358,
+      363,
+      374,
+      376,
+      379,
+      383,
+      385,
+      387,
+      393,
+      396,
+      400,
+      412,
+      417,
+      428,
+      434,
+      452,
+      453,
+      456,
+      459,
+      463,
+      473,
+      490,
+      493,
+      504,
+      517,
+      525,
+      535,
+      543,
+      544,
+      545,
+      549,
+      550,
+      565,
+      584,
+      585,
+      594,
+      602,
+      603,
+      606,
+      638,
+      642,
+      643,
+      651,
+      684,
+      687,
+      692,
+      700,
+      721,
+      728,
+      752,
+      757,
+      779,
+      783,
+      785,
+      794,
+      803,
+      807,
+      814,
+      847,
+      849,
+      853,
+      854,
+      861,
+      867,
+      884,
+      900,
+      903,
+      906,
+      924,
+      930,
+      931,
+      941,
+      948,
+      957,
+      968,
+      972,
+      980,
+      987,
+      995,
+      996,
+      1007,
+      1009,
+      1028,
+      1033,
+      1034,
+      1040,
+      1054,
+      1098,
+      1104,
+      1111,
+      1121,
+      1126,
+      1134,
+      1155,
+      1161,
+      1167,
+      1180,
+      1186,
+      1192,
+      1212,
+      1214,
+      1219,
+      1226,
+      1254,
+      1256,
+      1259,
+      1261,
+      1270,
+      1278,
+      1285,
+      1288,
+      1290,
+      1305,
+      1310,
+      1323,
+      1325,
+      1343,
+      1360,
+      1375,
+      1376,
+      1404,
+      1411,
+      1426,
+      1429,
+      1442,
+      1449,
+      1452,
+      1456,
+      1475,
+      1478,
+      1479,
+      1484,
+      1493,
+      1499,
+      1500,
+      1501,
+      1506,
+      1517,
+      1523,
+      1528,
+      1536,
+      1545,
+      1546,
+      1550,
+      1561,
+      1570,
+      1598,
+      1609,
+      1611,
+      1625,
+      1632,
+      1634,
+      1635,
+      1641,
+      1654,
+      1655
+    ],
+    "test_n": [
+      4,
+      5,
+      9,
+      16,
+      20,
+      25,
+      27,
+      33,
+      37,
+      40,
+      45,
+      46,
+      48,
+      53,
+      56,
+      60,
+      62,
+      67,
+      77,
+      78,
+      80,
+      81,
+      86,
+      90,
+      94,
+      99,
+      102,
+      106,
+      108,
+      111,
+      116,
+      121,
+      126,
+      127,
+      132,
+      143,
+      148,
+      153,
+      155,
+      156,
+      158,
+      160,
+      164,
+      168,
+      170,
+      171,
+      173,
+      175,
+      183,
+      184,
+      185,
+      188,
+      189,
+      190,
+      196,
+      202,
+      206,
+      208,
+      212,
+      217,
+      221,
+      222,
+      223,
+      233,
+      242,
+      246,
+      247,
+      259,
+      262,
+      269,
+      283,
+      298,
+      299,
+      306,
+      316,
+      317,
+      323,
+      330,
+      332,
+      334,
+      354,
+      357,
+      367,
+      372,
+      395,
+      397,
+      400,
+      405,
+      407,
+      420,
+      431,
+      435,
+      436,
+      444,
+      446,
+      461,
+      464,
+      470,
+      479,
+      481,
+      483,
+      485,
+      487,
+      494,
+      512,
+      516,
+      520,
+      524,
+      529,
+      530,
+      539,
+      540,
+      541,
+      554,
+      559,
+      560,
+      564,
+      568,
+      571,
+      572,
+      576,
+      577,
+      581,
+      585,
+      592,
+      602,
+      609,
+      620,
+      630,
+      632,
+      677,
+      678,
+      684,
+      693,
+      694,
+      695,
+      702,
+      716,
+      724,
+      727,
+      732,
+      735,
+      736,
+      747,
+      750,
+      752,
+      755,
+      758,
+      764,
+      767,
+      774,
+      775,
+      777,
+      779,
+      780,
+      782,
+      795,
+      800,
+      812,
+      815,
+      818,
+      821,
+      823,
+      825,
+      828,
+      834,
+      841,
+      843,
+      846,
+      848,
+      860,
+      861,
+      863,
+      869,
+      871,
+      878,
+      882,
+      891,
+      893,
+      896,
+      898,
+      899,
+      901,
+      906,
+      930,
+      940,
+      944,
+      969,
+      970,
+      973,
+      980,
+      990,
+      993,
+      996,
+      997,
+      1007,
+      1012,
+      1013,
+      1019,
+      1025
+    ]
+  }
+}

load_model.py CHANGED Viewed

@@ -208,7 +208,10 @@ def collate_fn(batch, tokenizer=None):
 import torch.multiprocessing as mp
 if __name__ == "__main__":
-    mp.set_start_method("spawn")
     set_seed(42)
     tokenizer = transformers.AutoTokenizer.from_pretrained(
         args.mllm,

 import torch.multiprocessing as mp
 if __name__ == "__main__":
+    try:
+        mp.set_start_method("spawn")
+    except RuntimeError:
+        pass
     set_seed(42)
     tokenizer = transformers.AutoTokenizer.from_pretrained(
         args.mllm,

train.py CHANGED Viewed

@@ -22,6 +22,8 @@ import re
 import time
 import os
 import sys
 import warnings
@@ -214,10 +216,61 @@ def collate_fn(batch, tokenizer=None):
     }
 import torch.multiprocessing as mp
 if __name__ == "__main__":
-    mp.set_start_method("spawn")
     set_seed(42)
     tokenizer = transformers.AutoTokenizer.from_pretrained(
         args.mllm,
         cache_dir=None,
@@ -230,18 +283,27 @@ if __name__ == "__main__":
     num_added_tokens = tokenizer.add_tokens("[SEG]")
     seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]  # 32000
     print("seg_token_idx: ", seg_token_idx)
     train_dataset = REFAVS('train', args, tokenizer, input_type='refer')
     val_dataset_s_refer = REFAVS('test_s', args, tokenizer, input_type='refer')
     val_dataset_u_refer = REFAVS('test_u', args, tokenizer, input_type='refer')
     val_dataset_n_refer = REFAVS('test_n', args, tokenizer, input_type='refer')
     if args.overfit_samples > 0:
         overfit_n = min(args.overfit_samples, len(train_dataset))
         train_dataset = Subset(train_dataset, list(range(overfit_n)))
         print(f"overfit_samples enabled: using first {overfit_n} train samples")
-    train_eval_dataset = train_dataset
     g = torch.Generator()
@@ -258,15 +320,25 @@ if __name__ == "__main__":
     model_args = {
         "train_mask_decoder": True,
         "out_dim": 256,  # 256
-        "ce_loss_weight": 1.0,
-        "dice_loss_weight": 0.5,
-        "bce_loss_weight": 2.0,
         "seg_token_idx": seg_token_idx,
         "vision_pretrained": args.vision_pretrained,  # sam_vit_h_xxx.pth
         "vision_tower": args.vision_tower,
         "use_im_start_end": False,
         "compress": args.compress,
         "start": args.start,
     }
     model = Simtoken_ForCausalLM.from_pretrained(args.mllm, torch_dtype=torch.float32, low_cpu_mem_usage=True, **model_args)
@@ -302,7 +374,17 @@ if __name__ == "__main__":
     for p in model.get_model().mm_projector.parameters():
         p.requires_grad = False
-    lora_r = 8
     target_modules = "q_proj,v_proj"
     if lora_r > 0:
@@ -370,17 +452,29 @@ if __name__ == "__main__":
     # for name, param in model.token_compressor.named_parameters():
     #     param.requires_grad = True
     for n, p in model.named_parameters():
         if any(
-                [
-                    x in n
-                    for x in ["lm_head", "embed_tokens", "mask_decoder", "text_hidden_fcs"]
-                ]
         ):
             p.requires_grad = True
-    if args.gate_only:
         for p in model.parameters():
             p.requires_grad = False
         for n, p in model.named_parameters():
@@ -487,12 +581,145 @@ if __name__ == "__main__":
         with open(os.path.join(args.log_root, f'{args.name}.txt'), "a") as f:
             f.write(message + "\n")
     def valuate(model, dataloader, args, name):
         model.eval()
         total_iou = 0
         total_fscore = 0
         count = 0
         for batch in tqdm(dataloader, desc=f"Evaluating on {name}"):
             input_dict = dict_to_cuda(batch)
@@ -513,7 +740,8 @@ if __name__ == "__main__":
                                             vids=input_dict["vids"],
                                             contrast=args.ct_weight,
                                             ref_ids=input_dict["ref_ids"],
-                                            inference=True)
             pred_masks = output_dict["pred_masks"]  # list[B]:[num_seg, T, H, W]
             gt_masks = output_dict["gt_masks"]  # list[B]:[num_seg, T, H, W]
             for i in range(len(pred_masks)):
@@ -526,18 +754,35 @@ if __name__ == "__main__":
                 total_fscore += fscore * num_seg * T
                 count += num_seg * T
         print(f"\n  valuate on {name}:  miou: {total_iou/count}  fscore: {total_fscore/count}")
         with open(os.path.join(args.log_root, f'{args.name}.txt'), "a") as f:
             f.write(f"valuate on {name}:  miou {total_iou/count}  true fscore {total_fscore/count} \n")
     # ---------------train------------------------------------------
     model.train()
     epochs = args.epochs
     print("init lr:", args.lr)
-    optimizer = AdamW(model.parameters(), lr=args.lr, betas=(0.9, 0.95), weight_decay=0.01)
     print_referent_gate_optimizer_sanity(model, optimizer)
     gradient_accumulation_steps = max(1, int(16 // args.batch_size))
@@ -613,7 +858,15 @@ if __name__ == "__main__":
                 optimizer.zero_grad()
                 current_lr = scheduler.get_lr()[0]
-                loop.set_postfix(lr=current_lr, loss=running_loss / ((step + 1) / gradient_accumulation_steps))
                 if args.max_steps > 0 and optimizer_step_count >= args.max_steps:
                     stop_training = True

 import time
 import os
 import sys
+import json
+from collections import defaultdict
 import warnings
     }
+def maybe_limit_dataset(dataset, max_rows, name):
+    if max_rows is None or max_rows <= 0:
+        return dataset
+    limited_n = min(max_rows, len(dataset))
+    print(f"max_eval_rows enabled: using first {limited_n} samples from {name}")
+    return Subset(dataset, list(range(limited_n)))
+def load_subset_manifest(path):
+    if not path:
+        return {}
+    with open(path, "r", encoding="utf-8") as f:
+        manifest = json.load(f)
+    if not isinstance(manifest, dict):
+        raise ValueError(f"subset_manifest must be a JSON object, got {type(manifest).__name__}")
+    if "subsets" in manifest:
+        manifest = manifest["subsets"]
+    return manifest
+def maybe_apply_manifest_subset(dataset, manifest, split_name, name):
+    if split_name not in manifest:
+        return dataset
+    indices = manifest[split_name]
+    if not isinstance(indices, list) or not all(isinstance(i, int) for i in indices):
+        raise ValueError(f"subset_manifest[{split_name!r}] must be a list of integers")
+    if not indices:
+        raise ValueError(f"subset_manifest[{split_name!r}] is empty")
+    max_index = len(dataset) - 1
+    bad_indices = [i for i in indices if i < 0 or i > max_index]
+    if bad_indices:
+        raise ValueError(
+            f"subset_manifest[{split_name!r}] contains out-of-range indices; "
+            f"dataset size={len(dataset)}, examples={bad_indices[:5]}"
+        )
+    print(f"subset_manifest enabled: using {len(indices)} fixed samples from {name} ({split_name})")
+    return Subset(dataset, indices)
+def checkpoint_requires_lora(saved_model_path):
+    if not saved_model_path or not os.path.exists(saved_model_path):
+        return False
+    state = torch.load(saved_model_path, map_location="cpu")
+    return any("lora_" in key for key in state.keys())
 import torch.multiprocessing as mp
 if __name__ == "__main__":
+    try:
+        mp.set_start_method("spawn")
+    except RuntimeError:
+        pass
     set_seed(42)
+    if args.bridge_only and not args.use_residual_prompt_bridge:
+        raise ValueError("--bridge_only requires --use_residual_prompt_bridge")
     tokenizer = transformers.AutoTokenizer.from_pretrained(
         args.mllm,
         cache_dir=None,
     num_added_tokens = tokenizer.add_tokens("[SEG]")
     seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]  # 32000
     print("seg_token_idx: ", seg_token_idx)
+    subset_manifest = load_subset_manifest(args.subset_manifest)
     train_dataset = REFAVS('train', args, tokenizer, input_type='refer')
     val_dataset_s_refer = REFAVS('test_s', args, tokenizer, input_type='refer')
     val_dataset_u_refer = REFAVS('test_u', args, tokenizer, input_type='refer')
     val_dataset_n_refer = REFAVS('test_n', args, tokenizer, input_type='refer')
+    train_dataset = maybe_apply_manifest_subset(train_dataset, subset_manifest, "train", "train")
+    val_dataset_s_refer = maybe_apply_manifest_subset(val_dataset_s_refer, subset_manifest, "test_s", "test_s")
+    val_dataset_u_refer = maybe_apply_manifest_subset(val_dataset_u_refer, subset_manifest, "test_u", "test_u")
+    val_dataset_n_refer = maybe_apply_manifest_subset(val_dataset_n_refer, subset_manifest, "test_n", "test_n")
     if args.overfit_samples > 0:
         overfit_n = min(args.overfit_samples, len(train_dataset))
         train_dataset = Subset(train_dataset, list(range(overfit_n)))
         print(f"overfit_samples enabled: using first {overfit_n} train samples")
+    train_eval_dataset = maybe_limit_dataset(train_dataset, args.max_eval_rows, "train_eval")
+    val_dataset_s_refer = maybe_limit_dataset(val_dataset_s_refer, args.max_eval_rows, "test_s")
+    val_dataset_u_refer = maybe_limit_dataset(val_dataset_u_refer, args.max_eval_rows, "test_u")
+    val_dataset_n_refer = maybe_limit_dataset(val_dataset_n_refer, args.max_eval_rows, "test_n")
     g = torch.Generator()
     model_args = {
         "train_mask_decoder": True,
         "out_dim": 256,  # 256
+        "ce_loss_weight": args.ce_loss_weight,
+        "dice_loss_weight": args.dice_loss_weight,
+        "bce_loss_weight": args.bce_loss_weight,
         "seg_token_idx": seg_token_idx,
         "vision_pretrained": args.vision_pretrained,  # sam_vit_h_xxx.pth
         "vision_tower": args.vision_tower,
         "use_im_start_end": False,
         "compress": args.compress,
         "start": args.start,
+        "use_residual_prompt_bridge": args.use_residual_prompt_bridge,
+        "bridge_pm_weight": args.bridge_pm_weight,
+        "bridge_rg_weight": args.bridge_rg_weight,
+        "bridge_norm_weight": args.bridge_norm_weight,
+        "bridge_mode": args.bridge_mode,
+        "bridge_condition": args.bridge_condition,
+        "bridge_directional_alpha": args.bridge_directional_alpha,
+        "bridge_gate_bias_init": args.bridge_gate_bias_init,
+        "bridge_residual_init_std": args.bridge_residual_init_std,
+        "bridge_target_frame": args.bridge_target_frame,
     }
     model = Simtoken_ForCausalLM.from_pretrained(args.mllm, torch_dtype=torch.float32, low_cpu_mem_usage=True, **model_args)
     for p in model.get_model().mm_projector.parameters():
         p.requires_grad = False
+    use_lora_checkpoint = (
+        (args.init_from_saved_model or args.gate_only)
+        and checkpoint_requires_lora(args.saved_model)
+    )
+    if args.bridge_only and use_lora_checkpoint:
+        print(
+            "bridge_only notice: saved_model contains LoRA weights, "
+            "so LoRA modules will be instantiated for checkpoint compatibility and then frozen."
+        )
+    lora_r = 8 if (not args.bridge_only or use_lora_checkpoint) else 0
     target_modules = "q_proj,v_proj"
     if lora_r > 0:
     # for name, param in model.token_compressor.named_parameters():
     #     param.requires_grad = True
     for n, p in model.named_parameters():
         if any(
+            [
+                x in n
+                for x in ["lm_head", "embed_tokens", "mask_decoder", "text_hidden_fcs"]
+            ]
         ):
             p.requires_grad = True
+    if args.bridge_only:
+        for p in model.parameters():
+            p.requires_grad = False
+        trainable_names = []
+        for n, p in model.named_parameters():
+            if "prompt_bridge" in n:
+                p.requires_grad = True
+                trainable_names.append(n)
+        trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
+        total = sum(p.numel() for p in model.parameters())
+        print(f"bridge_only enabled: trainable params {trainable} / {total}")
+        for name in trainable_names:
+            print(f"  bridge trainable: {name}")
+    elif args.gate_only:
         for p in model.parameters():
             p.requires_grad = False
         for n, p in model.named_parameters():
         with open(os.path.join(args.log_root, f'{args.name}.txt'), "a") as f:
             f.write(message + "\n")
+    def find_prompt_bridge_module(model):
+        for _, module in model.named_modules():
+            if module.__class__.__name__ == "ResidualPromptBridge":
+                return module
+        return None
+    def collect_prompt_bridge_grad_norms(model):
+        module = find_prompt_bridge_module(model)
+        if module is None:
+            return {}
+        def grad_norm(param):
+            if param.grad is None:
+                return None
+            return float(param.grad.detach().float().norm().item())
+        return {
+            "W_a": grad_norm(module.attn_proj.weight),
+            "W_r": grad_norm(module.residual_proj.weight),
+            "W_g": grad_norm(module.gate.weight),
+            "b_g": grad_norm(module.gate.bias),
+        }
+    def print_prompt_bridge_grad_norms(label, norms):
+        parts = []
+        for key in ["W_a", "W_r", "W_g", "b_g"]:
+            value = norms.get(key)
+            if value is None:
+                parts.append(f"{key}=None")
+            else:
+                parts.append(f"{key}={value:.6e}")
+        print(f"{label}: " + " | ".join(parts))
+    def run_bridge_sanity_checks(model, dataloader):
+        if not args.use_residual_prompt_bridge:
+            raise ValueError("--bridge_sanity_only requires --use_residual_prompt_bridge")
+        model.train()
+        batch = next(iter(dataloader))
+        input_dict = dict_to_cuda(batch)
+        output_dict = model.forward(
+            images=input_dict["images"],
+            images_clip=input_dict["images_clip"],
+            audio_features=input_dict["audio_feats"],
+            image_features=input_dict["image_feats"],
+            input_ids=input_dict["input_ids"],
+            labels=input_dict["labels"],
+            attention_masks=input_dict["attention_masks"],
+            masks_list=input_dict["masks"],
+            resize_list=input_dict["resizes"],
+            orgsize_list=input_dict["orgsizes"],
+            conversation_list=input_dict["convs"],
+            refs_num=input_dict["refs_num"],
+            fids=input_dict["fids"],
+            vids=input_dict["vids"],
+            contrast=0.0,
+            ref_ids=input_dict["ref_ids"],
+            epoch=0,
+            inference=False,
+            target_frame=args.bridge_target_frame,
+        )
+        model.zero_grad(set_to_none=True)
+        output_dict["mask_loss"].backward(retain_graph=True)
+        print_prompt_bridge_grad_norms(
+            "bridge grad check | L_mask only",
+            collect_prompt_bridge_grad_norms(model),
+        )
+        model.zero_grad(set_to_none=True)
+        output_dict["bridge_teacher_loss_raw"].backward()
+        print_prompt_bridge_grad_norms(
+            "bridge grad check | L_teach only",
+            collect_prompt_bridge_grad_norms(model),
+        )
+        metrics = output_dict["bridge_metrics"]
+        print(
+            "bridge identity check: "
+            f"delta_norm_mean={metrics['delta_norm_mean']:.6f} | "
+            f"cos(p_hat,q)={metrics['cos_p_hat_q_mean']:.6f} | "
+            f"q_norm_mean={metrics['q_norm_mean']:.6f} | "
+            f"p_hat_norm_mean={metrics['p_hat_norm_mean']:.6f} | "
+            f"gate_mean={metrics['gate_mean']:.6f} | "
+            f"gate_std={metrics['gate_std']:.6f}"
+        )
+        teacher_pm_norms = []
+        teacher_rg_norms = []
+        teacher_cosines = []
+        scanned_batches = max(1, args.bridge_sanity_batches)
+        model.eval()
+        with torch.no_grad():
+            for batch_idx, batch in enumerate(dataloader):
+                if batch_idx >= scanned_batches:
+                    break
+                input_dict = dict_to_cuda(batch)
+                result = model.forward(
+                    images=input_dict["images"],
+                    images_clip=input_dict["images_clip"],
+                    audio_features=input_dict["audio_feats"],
+                    image_features=input_dict["image_feats"],
+                    input_ids=input_dict["input_ids"],
+                    labels=input_dict["labels"],
+                    attention_masks=input_dict["attention_masks"],
+                    masks_list=input_dict["masks"],
+                    resize_list=input_dict["resizes"],
+                    orgsize_list=input_dict["orgsizes"],
+                    conversation_list=input_dict["convs"],
+                    refs_num=input_dict["refs_num"],
+                    fids=input_dict["fids"],
+                    vids=input_dict["vids"],
+                    contrast=0.0,
+                    ref_ids=input_dict["ref_ids"],
+                    inference=True,
+                    target_frame=args.bridge_target_frame,
+                )
+                bridge_metrics = result["bridge_metrics"]
+                teacher_pm_norms.append(bridge_metrics["p_mask_norm_mean"])
+                teacher_rg_norms.append(bridge_metrics["z_gt_norm_mean"])
+                teacher_cosines.append(bridge_metrics["cos_p_mask_z_gt_mean"])
+        print(
+            "bridge teacher sanity: "
+            f"mean||p_mask||={float(np.mean(teacher_pm_norms)):.6f} | "
+            f"mean||z_gt||={float(np.mean(teacher_rg_norms)):.6f} | "
+            f"mean cos(p_mask,z_gt)={float(np.mean(teacher_cosines)):.6f}"
+        )
     def valuate(model, dataloader, args, name):
         model.eval()
         total_iou = 0
         total_fscore = 0
         count = 0
+        bridge_accumulators = defaultdict(float)
+        bridge_count = 0
         for batch in tqdm(dataloader, desc=f"Evaluating on {name}"):
             input_dict = dict_to_cuda(batch)
                                             vids=input_dict["vids"],
                                             contrast=args.ct_weight,
                                             ref_ids=input_dict["ref_ids"],
+                                            inference=True,
+                                            target_frame=args.bridge_target_frame)
             pred_masks = output_dict["pred_masks"]  # list[B]:[num_seg, T, H, W]
             gt_masks = output_dict["gt_masks"]  # list[B]:[num_seg, T, H, W]
             for i in range(len(pred_masks)):
                 total_fscore += fscore * num_seg * T
                 count += num_seg * T
+            if args.use_residual_prompt_bridge and "bridge_metrics" in output_dict:
+                for key, value in output_dict["bridge_metrics"].items():
+                    bridge_accumulators[key] += float(value)
+                bridge_count += 1
         print(f"\n  valuate on {name}:  miou: {total_iou/count}  fscore: {total_fscore/count}")
         with open(os.path.join(args.log_root, f'{args.name}.txt'), "a") as f:
             f.write(f"valuate on {name}:  miou {total_iou/count}  true fscore {total_fscore/count} \n")
+            if bridge_count > 0:
+                bridge_summary = " | ".join(
+                    f"{key}={bridge_accumulators[key] / bridge_count:.6f}"
+                    for key in sorted(bridge_accumulators.keys())
+                )
+                print(f"  bridge on {name}: {bridge_summary}")
+                f.write(f"bridge on {name}: {bridge_summary}\n")
+    if args.bridge_sanity_only:
+        run_bridge_sanity_checks(model, train_eval_dataloader)
+        sys.exit(0)
     # ---------------train------------------------------------------
     model.train()
     epochs = args.epochs
     print("init lr:", args.lr)
+    trainable_params = [p for p in model.parameters() if p.requires_grad]
+    optimizer = AdamW(trainable_params, lr=args.lr, betas=(0.9, 0.95), weight_decay=0.01)
     print_referent_gate_optimizer_sanity(model, optimizer)
     gradient_accumulation_steps = max(1, int(16 // args.batch_size))
                 optimizer.zero_grad()
                 current_lr = scheduler.get_lr()[0]
+                postfix = {
+                    "lr": current_lr,
+                    "loss": running_loss / ((step + 1) / gradient_accumulation_steps),
+                }
+                if args.use_residual_prompt_bridge:
+                    postfix["bridge"] = float(output_dict["bridge_teacher_loss"].item())
+                    postfix["pm"] = float(output_dict["bridge_pm_loss"].item())
+                    postfix["rg"] = float(output_dict["bridge_rg_loss"].item())
+                loop.set_postfix(**postfix)
                 if args.max_steps > 0 and optimizer_step_count >= args.max_steps:
                     stop_training = True

upload_hf.py CHANGED Viewed

@@ -1,120 +1,73 @@
-"""
-Upload SimToken folder to HuggingFace.
-Usage:
-    python upload_hf.py --repo your-username/SimToken [--private]
-Features:
-    - Automatic retry on rate limit (HTTP 429) with exponential backoff
-    - Built-in resumption: upload_large_folder caches progress locally;
-      re-running the script will skip already-uploaded files
-    - Logs to both console and upload.log
 """
 import argparse
 import logging
-import time
 from pathlib import Path
-from huggingface_hub import HfApi
-from huggingface_hub.utils import HfHubHTTPError
-# ── Config ─────────────────────────────────────────────────────────────────
-FOLDER = Path(__file__).parent          # SimToken directory
 IGNORE_PATTERNS = [
     "**/__pycache__/**",
     "**/*.pyc",
     "upload.log",
 ]
-NUM_WORKERS  = 1    # conservative; increase to 8 if no rate-limit errors
-MAX_RETRIES  = 10
-# ───────────────────────────────────────────────────────────────────────────
-logging.basicConfig(
-    level=logging.INFO,
-    format="%(asctime)s  %(levelname)-8s  %(message)s",
-    datefmt="%H:%M:%S",
-    handlers=[
-        logging.FileHandler(FOLDER / "upload.log"),
-        logging.StreamHandler(),
-    ],
-)
-log = logging.getLogger(__name__)
-def parse_args():
-    p = argparse.ArgumentParser()
-    p.add_argument("--repo", required=True,
-                   help="HuggingFace repo id, e.g. your-username/SimToken")
-    return p.parse_args()
-def main():
     args = parse_args()
-    api  = HfApi()
-    # ── 1. Create repo (idempotent) ────────────────────────────────────────
-    log.info(f"Ensuring repo '{args.repo}' exists ...")
-    api.create_repo(
         repo_id=args.repo,
-        repo_type="model",
-        private=False,
         exist_ok=True,
     )
-    log.info("Repo ready.")
-    # ── 2. Upload with retry ───────────────────────────────────────────────
-    for attempt in range(1, MAX_RETRIES + 1):
-        try:
-            log.info(f"[Attempt {attempt}/{MAX_RETRIES}] Starting upload_large_folder ...")
-            log.info(f"  folder : {FOLDER}")
-            log.info(f"  repo   : {args.repo}")
-            log.info(f"  workers: {NUM_WORKERS}")
-            log.info("  (re-running this script will resume from where it left off)")
-            api.upload_large_folder(
-                folder_path=str(FOLDER),
-                repo_id=args.repo,
-                repo_type="model",
-                ignore_patterns=IGNORE_PATTERNS,
-                num_workers=NUM_WORKERS,
-                print_report=True,
-                print_report_every=120,   # print progress every 2 minutes
-            )
-            log.info("Upload complete!")
-            return
-        except HfHubHTTPError as e:
-            status = e.response.status_code if e.response is not None else "?"
-            if status == 429:
-                # Two possible 429 causes:
-                #   1. API request rate (resets in ~300s)
-                #   2. Commit rate limit: 128 commits/hour (resets in ~3600s)
-                # Wait long enough to cover the commit rate limit reset.
-                wait = 3700
-                log.warning(f"Rate limited (HTTP 429). Waiting {wait}s (~1 hour) for commit rate limit reset ...")
-                time.sleep(wait)
-            elif status in (500, 502, 503, 504):
-                # Transient server error
-                wait = 30 * attempt
-                log.warning(f"Server error (HTTP {status}). Waiting {wait}s before retry ...")
-                time.sleep(wait)
-            else:
-                log.error(f"HTTP error {status}: {e}")
-                raise
-        except Exception as e:
-            if attempt < MAX_RETRIES:
-                wait = 30 * attempt
-                log.warning(f"Unexpected error: {e}. Retrying in {wait}s ...")
-                time.sleep(wait)
-            else:
-                log.error(f"All {MAX_RETRIES} attempts failed. Last error: {e}")
-                raise
-    log.error("Upload did not complete after all retries.")
 if __name__ == "__main__":

+"""Upload the current SimToken workspace to HuggingFace Hub.
+Example:
+    python upload_hf.py --repo yfan07/SimToken
 """
+from __future__ import annotations
 import argparse
 import logging
 from pathlib import Path
+from huggingface_hub import HfApi, create_repo
+ROOT = Path(__file__).resolve().parent
 IGNORE_PATTERNS = [
+    ".git/**",
     "**/__pycache__/**",
+    "**/.pytest_cache/**",
+    "**/.cache/**",
     "**/*.pyc",
+    "**/*.pyo",
     "upload.log",
 ]
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Upload SimToken to HuggingFace Hub.")
+    parser.add_argument("--repo", required=True, help="Repo id, e.g. yfan07/SimToken")
+    parser.add_argument("--repo_type", default="model", choices=["model", "dataset", "space"])
+    parser.add_argument("--private", action="store_true", help="Create repo as private if missing.")
+    parser.add_argument("--num_workers", type=int, default=4)
+    return parser.parse_args()
+def main() -> None:
     args = parse_args()
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s %(levelname)s %(message)s",
+        handlers=[logging.FileHandler(ROOT / "upload.log"), logging.StreamHandler()],
+    )
+    create_repo(
         repo_id=args.repo,
+        repo_type=args.repo_type,
+        private=args.private,
         exist_ok=True,
     )
+    api = HfApi()
+    if hasattr(api, "upload_large_folder"):
+        logging.info("Uploading %s to %s with upload_large_folder", ROOT, args.repo)
+        api.upload_large_folder(
+            repo_id=args.repo,
+            repo_type=args.repo_type,
+            folder_path=str(ROOT),
+            ignore_patterns=IGNORE_PATTERNS,
+            num_workers=args.num_workers,
+        )
+    else:
+        logging.info("Uploading %s to %s with upload_folder", ROOT, args.repo)
+        api.upload_folder(
+            repo_id=args.repo,
+            repo_type=args.repo_type,
+            folder_path=str(ROOT),
+            ignore_patterns=IGNORE_PATTERNS,
+        )
 if __name__ == "__main__":