| --- |
| base_model: |
| - openai/clip-vit-base-patch32 |
| - openai/clip-vit-base-patch16 |
| - openai/clip-vit-large-patch14 |
| license: mit |
| pipeline_tag: image-classification |
| tags: |
| - clip |
| - task-arithmetic |
| - model-merging |
| - fine-tuning |
| - orthogonal-regularization |
| --- |
| |
| # Understanding and Enforcing Weight Disentanglement in Task Arithmetic |
|
|
| [CVPR 2026] Official checkpoints for the paper **"Understanding and Enforcing Weight Disentanglement in Task Arithmetic"**. |
|
|
| **Authors**: [Shangge Liu](https://huggingface.co/gezi2333), Yuehan Yin, Lei Wang, Qi Fan, Yinghuan Shi, Wenbin Li, Yang Gao, and Dacheng Tao. |
|
|
| [[Paper](https://arxiv.org/abs/2604.17078)] [[Code](https://github.com/RL-MIND/OrthoReg)] |
|
|
| --- |
|
|
| ## π― Abstract |
|
|
| Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of "weight disentanglement" describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model ($\theta_0$) or the task vectors ($\tau_t$) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates ($\Delta W$) that constitute $\tau_t$ during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods. |
| |
| ### β¨ Key Contributions |
| |
| - π **Theory**: We identify TFS as a sufficient condition for weight disentanglement, and WVO as its geometric consequence, providing the first principled explanation for task arithmetic. |
| - π§ **Method (OrthoReg)**: A simple regularization term added to the fine-tuning loss that enforces column-wise orthogonality on ΞW, for which we prove theoretical efficacy. |
| - π **Connection to TTA**: We show that OrthoReg and Tangent Task Arithmetic (TTA) share the same underlying mechanism (i.e. inter-task vector orthogonality), but OrthoReg achieves this more efficiently. |
| - π **Experiments**: Consistent and significant improvements over Non-linear FT, TTA, ATT-FT, LoRA-ATT across ViT-B-32, ViT-B-16, and ViT-L-14. |
| |
| --- |
| |
| ### The OrthoReg Loss |
| |
| The total loss adds a regularization term to the standard task objective: |
| |
| $$\mathcal{L} = \mathcal{L}_{\text{task}}(\theta_0 + \Delta\theta) + \lambda \cdot \mathcal{L}_{\text{ortho}}(\Delta\theta)$$ |
|
|
| $$\mathcal{L}_{\text{ortho}}(\Delta\theta) = \sum_l \left\|(\Delta W^{(l)})^\top \Delta W^{(l)} - I\right\|_F^2$$ |
| |
| --- |
| |
| ## π Checkpoint Structure |
| |
| This repository contains fine-tuned checkpoints for **ViT-B-32, ViT-B-16, and ViT-L-14** on all 8 tasks, covering the following finetuning modes: |
| |
| | Directory | Mode | Description | |
| |---|---|---| |
| | `standard_1e-05_{model}/` | `standard` | Non-linear full fine-tuning (baseline) | |
| | `linear_1e-05_{model}/` | `linear` | TTA β tangent space fine-tuning (baseline) | |
| | `linear-2_1e-05_{model}/` | `linear-2` | ATT-FT β attention-only fine-tuning (baseline) | |
| | `linear_ortho_1e-05_lambda1.0_{model}/` | `linear_ortho` | TTA + OrthoReg | |
| | `ViT-B-32/`, `ViT-B-16/`, `ViT-L-14/` | β | Pre-trained CLIP base model weights | |
|
|
| Each mode directory is organized by dataset: |
|
|
| ``` |
| {mode}_{lr}_{model}/ |
| βββ head_CarsVal.pt # linear classification head |
| βββ head_DTDVal.pt |
| βββ head_EuroSATVal.pt |
| βββ head_GTSRBVal.pt |
| βββ head_MNISTVal.pt |
| βββ head_RESISC45Val.pt |
| βββ head_SUN397Val.pt |
| βββ head_SVHNVal.pt |
| βββ CarsVal/ |
| β βββ {mode}_finetuned.pt # fine-tuned model weights (task vector + ΞΈβ) |
| β βββ {mode}_zeroshot.pt # zero-shot reference weights |
| βββ DTDVal/ |
| ... |
| βββ SVHNVal/ |
| ``` |
|
|
| All checkpoints use **seed=1993** and **lr=1e-5** to match the paper's reported results. |
|
|
| --- |
|
|
| ## π Usage |
|
|
| ### Step 1 β Clone this repository |
|
|
| ```bash |
| git lfs install |
| git clone https://huggingface.co/gezi2333/OrthoReg-checkpoints |
| ``` |
|
|
| Place the cloned folder as `OrthoReg/checkpoints_1993/` inside your code directory: |
|
|
| ```bash |
| mv OrthoReg-checkpoints/* OrthoReg/checkpoints_1993/ |
| ``` |
|
|
| ### Step 2 β Install the codebase |
|
|
| ```bash |
| git clone https://github.com/RL-MIND/OrthoReg |
| cd OrthoReg |
| conda env create |
| conda activate tangent-arithmetic |
| export PYTHONPATH="$PYTHONPATH:$PWD" |
| ``` |
|
|
| ### Step 3 β Run evaluation |
|
|
| Evaluate single-task accuracy: |
|
|
| ```bash |
| python src/eval_single_task.py \ |
| --model ViT-B-32 \ |
| --finetuning-mode linear_ortho \ |
| --ortho-lambda 1.0 \ |
| --lr 1e-5 \ |
| --seed 1993 \ |
| --data-location /path/to/datasets/ |
| ``` |
|
|
| Evaluate task addition: |
|
|
| ```bash |
| python src/eval_task_addition.py \ |
| --model ViT-B-32 \ |
| --finetuning-mode linear_ortho \ |
| --ortho-lambda 1.0 \ |
| --lr 1e-5 \ |
| --seed 1993 \ |
| --data-location /path/to/datasets/ |
| ``` |
|
|
| Evaluate task negation: |
|
|
| ```bash |
| python src/eval_task_negation.py \ |
| --model ViT-B-32 \ |
| --finetuning-mode linear_ortho \ |
| --ortho-lambda 1.0 \ |
| --lr 1e-5 \ |
| --seed 1993 \ |
| --data-location /path/to/datasets/ |
| ``` |
|
|
| > Run `eval_single_task` with `--finetuning-mode none --ortho-lambda 0` first to generate `zeroshot_accuracies.json`, which is required as the reference for normalized accuracy. |
| |
| --- |
| |
| ## π¦ Datasets |
| |
| We evaluate on 8 image classification benchmarks: **Cars Β· DTD Β· EuroSAT Β· GTSRB Β· MNIST Β· RESISC45 Β· SUN397 Β· SVHN** |
| |
| For dataset preparation, follow the instructions in the [TTA repository](https://github.com/gortizji/tangent_task_arithmetic#datasets). |
| |
| --- |
| |
| ## π Citation |
| |
| If you find this work useful, please cite: |
| |
| ```bibtex |
| @inproceedings{liu2026orthoreg, |
| title = {Understanding and Enforcing Weight Disentanglement in Task Arithmetic}, |
| author = {Liu, Shangge and Yin, Yuehan and Wang, Lei and Fan, Qi and |
| Shi, Yinghuan and Li, Wenbin and Gao, Yang and Tao, Dacheng}, |
| booktitle = {CVPR}, |
| year = {2026} |
| } |
| ``` |
| |
| --- |
| |
| ## π¬ Acknowledgements |
| |
| This codebase is built on top of [Task Arithmetic](https://github.com/mlfoundations/task_vectors), [Tangent Task Arithmetic](https://github.com/gortizji/tangent_task_arithmetic), and [Attention-Only Fine-tuning](https://github.com/kyrie-23/linear_task_arithmetic). We thank the authors for releasing their code. |