Add pipeline tag and improve documentation (#1)

d4d4f70 22 days ago

7.17 kB

	---
	base_model:
	- openai/clip-vit-base-patch32
	- openai/clip-vit-base-patch16
	- openai/clip-vit-large-patch14
	license: mit
	pipeline_tag: image-classification
	tags:
	- clip
	- task-arithmetic
	- model-merging
	- fine-tuning
	- orthogonal-regularization
	---

	# Understanding and Enforcing Weight Disentanglement in Task Arithmetic

	[CVPR 2026] Official checkpoints for the paper "Understanding and Enforcing Weight Disentanglement in Task Arithmetic".

	Authors: [Shangge Liu](https://huggingface.co/gezi2333), Yuehan Yin, Lei Wang, Qi Fan, Yinghuan Shi, Wenbin Li, Yang Gao, and Dacheng Tao.

	[[Paper](https://arxiv.org/abs/2604.17078)]   [[Code](https://github.com/RL-MIND/OrthoReg)]

	---

	## 🎯 Abstract

	Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of "weight disentanglement" describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model ($\theta_0$) or the task vectors ($\tau_t$) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates ($\Delta W$) that constitute $\tau_t$ during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods.

	### ✨ Key Contributions

	- 📐 Theory: We identify TFS as a sufficient condition for weight disentanglement, and WVO as its geometric consequence, providing the first principled explanation for task arithmetic.
	- 🔧 Method (OrthoReg): A simple regularization term added to the fine-tuning loss that enforces column-wise orthogonality on ΔW, for which we prove theoretical efficacy.
	- 🔗 Connection to TTA: We show that OrthoReg and Tangent Task Arithmetic (TTA) share the same underlying mechanism (i.e. inter-task vector orthogonality), but OrthoReg achieves this more efficiently.
	- 📊 Experiments: Consistent and significant improvements over Non-linear FT, TTA, ATT-FT, LoRA-ATT across ViT-B-32, ViT-B-16, and ViT-L-14.

	---

	### The OrthoReg Loss

	The total loss adds a regularization term to the standard task objective:

	$$\mathcal{L} = \mathcal{L}_{\text{task}}(\theta_0 + \Delta\theta) + \lambda \cdot \mathcal{L}_{\text{ortho}}(\Delta\theta)$$

	$$\mathcal{L}_{\text{ortho}}(\Delta\theta) = \sum_l \left\\|(\Delta W^{(l)})^\top \Delta W^{(l)} - I\right\\|_F^2$$

	---

	## 📁 Checkpoint Structure

	This repository contains fine-tuned checkpoints for ViT-B-32, ViT-B-16, and ViT-L-14 on all 8 tasks, covering the following finetuning modes:

	\| Directory \| Mode \| Description \|
	\|---\|---\|---\|
	\| `standard_1e-05_{model}/` \| `standard` \| Non-linear full fine-tuning (baseline) \|
	\| `linear_1e-05_{model}/` \| `linear` \| TTA — tangent space fine-tuning (baseline) \|
	\| `linear-2_1e-05_{model}/` \| `linear-2` \| ATT-FT — attention-only fine-tuning (baseline) \|
	\| `linear_ortho_1e-05_lambda1.0_{model}/` \| `linear_ortho` \| TTA + OrthoReg \|
	\| `ViT-B-32/`, `ViT-B-16/`, `ViT-L-14/` \| — \| Pre-trained CLIP base model weights \|

	Each mode directory is organized by dataset:

	```
	{mode}_{lr}_{model}/
	├── head_CarsVal.pt # linear classification head
	├── head_DTDVal.pt
	├── head_EuroSATVal.pt
	├── head_GTSRBVal.pt
	├── head_MNISTVal.pt
	├── head_RESISC45Val.pt
	├── head_SUN397Val.pt
	├── head_SVHNVal.pt
	├── CarsVal/
	│ ├── {mode}_finetuned.pt # fine-tuned model weights (task vector + θ₀)
	│ └── {mode}_zeroshot.pt # zero-shot reference weights
	├── DTDVal/
	...
	└── SVHNVal/
	```

	All checkpoints use seed=1993 and lr=1e-5 to match the paper's reported results.

	---

	## 🚀 Usage

	### Step 1 — Clone this repository

	```bash
	git lfs install
	git clone https://huggingface.co/gezi2333/OrthoReg-checkpoints
	```

	Place the cloned folder as `OrthoReg/checkpoints_1993/` inside your code directory:

	```bash
	mv OrthoReg-checkpoints/* OrthoReg/checkpoints_1993/
	```

	### Step 2 — Install the codebase

	```bash
	git clone https://github.com/RL-MIND/OrthoReg
	cd OrthoReg
	conda env create
	conda activate tangent-arithmetic
	export PYTHONPATH="$PYTHONPATH:$PWD"
	```

	### Step 3 — Run evaluation

	Evaluate single-task accuracy:

	```bash
	python src/eval_single_task.py \
	--model ViT-B-32 \
	--finetuning-mode linear_ortho \
	--ortho-lambda 1.0 \
	--lr 1e-5 \
	--seed 1993 \
	--data-location /path/to/datasets/
	```

	Evaluate task addition:

	```bash
	python src/eval_task_addition.py \
	--model ViT-B-32 \
	--finetuning-mode linear_ortho \
	--ortho-lambda 1.0 \
	--lr 1e-5 \
	--seed 1993 \
	--data-location /path/to/datasets/
	```

	Evaluate task negation:

	```bash
	python src/eval_task_negation.py \
	--model ViT-B-32 \
	--finetuning-mode linear_ortho \
	--ortho-lambda 1.0 \
	--lr 1e-5 \
	--seed 1993 \
	--data-location /path/to/datasets/
	```

	> Run `eval_single_task` with `--finetuning-mode none --ortho-lambda 0` first to generate `zeroshot_accuracies.json`, which is required as the reference for normalized accuracy.

	---

	## 📦 Datasets

	We evaluate on 8 image classification benchmarks: Cars · DTD · EuroSAT · GTSRB · MNIST · RESISC45 · SUN397 · SVHN

	For dataset preparation, follow the instructions in the [TTA repository](https://github.com/gortizji/tangent_task_arithmetic#datasets).

	---

	## 📝 Citation

	If you find this work useful, please cite:

	```bibtex
	@inproceedings{liu2026orthoreg,
	title = {Understanding and Enforcing Weight Disentanglement in Task Arithmetic},
	author = {Liu, Shangge and Yin, Yuehan and Wang, Lei and Fan, Qi and
	Shi, Yinghuan and Li, Wenbin and Gao, Yang and Tao, Dacheng},
	booktitle = {CVPR},
	year = {2026}
	}
	```

	---

	## 📬 Acknowledgements

	This codebase is built on top of [Task Arithmetic](https://github.com/mlfoundations/task_vectors), [Tangent Task Arithmetic](https://github.com/gortizji/tangent_task_arithmetic), and [Attention-Only Fine-tuning](https://github.com/kyrie-23/linear_task_arithmetic). We thank the authors for releasing their code.