File size: 7,168 Bytes
a53e9e3 d4d4f70 a53e9e3 d4d4f70 a53e9e3 d4d4f70 a53e9e3 d4d4f70 1d5bb14 a53e9e3 d4d4f70 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | ---
base_model:
- openai/clip-vit-base-patch32
- openai/clip-vit-base-patch16
- openai/clip-vit-large-patch14
license: mit
pipeline_tag: image-classification
tags:
- clip
- task-arithmetic
- model-merging
- fine-tuning
- orthogonal-regularization
---
# Understanding and Enforcing Weight Disentanglement in Task Arithmetic
[CVPR 2026] Official checkpoints for the paper **"Understanding and Enforcing Weight Disentanglement in Task Arithmetic"**.
**Authors**: [Shangge Liu](https://huggingface.co/gezi2333), Yuehan Yin, Lei Wang, Qi Fan, Yinghuan Shi, Wenbin Li, Yang Gao, and Dacheng Tao.
[[Paper](https://arxiv.org/abs/2604.17078)] [[Code](https://github.com/RL-MIND/OrthoReg)]
---
## π― Abstract
Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of "weight disentanglement" describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model ($\theta_0$) or the task vectors ($\tau_t$) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates ($\Delta W$) that constitute $\tau_t$ during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods.
### β¨ Key Contributions
- π **Theory**: We identify TFS as a sufficient condition for weight disentanglement, and WVO as its geometric consequence, providing the first principled explanation for task arithmetic.
- π§ **Method (OrthoReg)**: A simple regularization term added to the fine-tuning loss that enforces column-wise orthogonality on ΞW, for which we prove theoretical efficacy.
- π **Connection to TTA**: We show that OrthoReg and Tangent Task Arithmetic (TTA) share the same underlying mechanism (i.e. inter-task vector orthogonality), but OrthoReg achieves this more efficiently.
- π **Experiments**: Consistent and significant improvements over Non-linear FT, TTA, ATT-FT, LoRA-ATT across ViT-B-32, ViT-B-16, and ViT-L-14.
---
### The OrthoReg Loss
The total loss adds a regularization term to the standard task objective:
$$\mathcal{L} = \mathcal{L}_{\text{task}}(\theta_0 + \Delta\theta) + \lambda \cdot \mathcal{L}_{\text{ortho}}(\Delta\theta)$$
$$\mathcal{L}_{\text{ortho}}(\Delta\theta) = \sum_l \left\|(\Delta W^{(l)})^\top \Delta W^{(l)} - I\right\|_F^2$$
---
## π Checkpoint Structure
This repository contains fine-tuned checkpoints for **ViT-B-32, ViT-B-16, and ViT-L-14** on all 8 tasks, covering the following finetuning modes:
| Directory | Mode | Description |
|---|---|---|
| `standard_1e-05_{model}/` | `standard` | Non-linear full fine-tuning (baseline) |
| `linear_1e-05_{model}/` | `linear` | TTA β tangent space fine-tuning (baseline) |
| `linear-2_1e-05_{model}/` | `linear-2` | ATT-FT β attention-only fine-tuning (baseline) |
| `linear_ortho_1e-05_lambda1.0_{model}/` | `linear_ortho` | TTA + OrthoReg |
| `ViT-B-32/`, `ViT-B-16/`, `ViT-L-14/` | β | Pre-trained CLIP base model weights |
Each mode directory is organized by dataset:
```
{mode}_{lr}_{model}/
βββ head_CarsVal.pt # linear classification head
βββ head_DTDVal.pt
βββ head_EuroSATVal.pt
βββ head_GTSRBVal.pt
βββ head_MNISTVal.pt
βββ head_RESISC45Val.pt
βββ head_SUN397Val.pt
βββ head_SVHNVal.pt
βββ CarsVal/
β βββ {mode}_finetuned.pt # fine-tuned model weights (task vector + ΞΈβ)
β βββ {mode}_zeroshot.pt # zero-shot reference weights
βββ DTDVal/
...
βββ SVHNVal/
```
All checkpoints use **seed=1993** and **lr=1e-5** to match the paper's reported results.
---
## π Usage
### Step 1 β Clone this repository
```bash
git lfs install
git clone https://huggingface.co/gezi2333/OrthoReg-checkpoints
```
Place the cloned folder as `OrthoReg/checkpoints_1993/` inside your code directory:
```bash
mv OrthoReg-checkpoints/* OrthoReg/checkpoints_1993/
```
### Step 2 β Install the codebase
```bash
git clone https://github.com/RL-MIND/OrthoReg
cd OrthoReg
conda env create
conda activate tangent-arithmetic
export PYTHONPATH="$PYTHONPATH:$PWD"
```
### Step 3 β Run evaluation
Evaluate single-task accuracy:
```bash
python src/eval_single_task.py \
--model ViT-B-32 \
--finetuning-mode linear_ortho \
--ortho-lambda 1.0 \
--lr 1e-5 \
--seed 1993 \
--data-location /path/to/datasets/
```
Evaluate task addition:
```bash
python src/eval_task_addition.py \
--model ViT-B-32 \
--finetuning-mode linear_ortho \
--ortho-lambda 1.0 \
--lr 1e-5 \
--seed 1993 \
--data-location /path/to/datasets/
```
Evaluate task negation:
```bash
python src/eval_task_negation.py \
--model ViT-B-32 \
--finetuning-mode linear_ortho \
--ortho-lambda 1.0 \
--lr 1e-5 \
--seed 1993 \
--data-location /path/to/datasets/
```
> Run `eval_single_task` with `--finetuning-mode none --ortho-lambda 0` first to generate `zeroshot_accuracies.json`, which is required as the reference for normalized accuracy.
---
## π¦ Datasets
We evaluate on 8 image classification benchmarks: **Cars Β· DTD Β· EuroSAT Β· GTSRB Β· MNIST Β· RESISC45 Β· SUN397 Β· SVHN**
For dataset preparation, follow the instructions in the [TTA repository](https://github.com/gortizji/tangent_task_arithmetic#datasets).
---
## π Citation
If you find this work useful, please cite:
```bibtex
@inproceedings{liu2026orthoreg,
title = {Understanding and Enforcing Weight Disentanglement in Task Arithmetic},
author = {Liu, Shangge and Yin, Yuehan and Wang, Lei and Fan, Qi and
Shi, Yinghuan and Li, Wenbin and Gao, Yang and Tao, Dacheng},
booktitle = {CVPR},
year = {2026}
}
```
---
## π¬ Acknowledgements
This codebase is built on top of [Task Arithmetic](https://github.com/mlfoundations/task_vectors), [Tangent Task Arithmetic](https://github.com/gortizji/tangent_task_arithmetic), and [Attention-Only Fine-tuning](https://github.com/kyrie-23/linear_task_arithmetic). We thank the authors for releasing their code. |