ExPLoRA / README.md

Update README.md

87c085f verified 17 days ago

4.41 kB

	---
	license: apache-2.0
	tags:
	- vision
	- image-classification
	- remote-sensing
	- lora
	- peft
	- domain-adaptation
	- vision-transformer
	- continual-learning
	datasets:
	- fmow
	- sentinel-2
	pipeline_tag: image-classification
	---

	# ExPLoRA: Parameter-Efficient Extended Pre-Training

	[Paper](https://arxiv.org/abs/2406.10973) \| [Code](https://github.com/samar-khanna/ExPLoRA) \| [Website](https://samar-khanna.github.io/ExPLoRA/) \| [Video](https://slideslive.com/39039614)

	This repository contains pre-trained checkpoints from the ICML 2025 paper:
	_"ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts"_

	## Overview

	ExPLoRA is a parameter-efficient method for adapting pre-trained Vision Transformers (ViT) to new domains using LoRA-based extended pre-training. Instead of training the full architecture, ExPLoRA freezes most of the backbone and trains low-rank adapters and a small subset of ViT blocks during self-supervised pre-training on target domain data.

	<p align="center">
	<img src="https://samar-khanna.github.io/ExPLoRA/static/images/explora_arch.svg" width="600" style="background-color: white; padding: 10px; border-radius: 8px;"/>
	</p>

	---

	## 📁 Checkpoints

	> Note: All checkpoints have LoRA adapters already merged into the weights. The full checkpoints retain the separate `q_proj`, `k_proj`, `v_proj` layers (with merged LoRA) alongside the combined `qkv` weights for reference. The encoder-only checkpoints contain just the merged `qkv` weights, ready for downstream use.

	### `explora_dinov2_fmow_rgb/`

	ExPLoRA checkpoints using DINOv2 self-supervised pre-training on fMoW high-resolution RGB satellite imagery.

	\| Description \| ViT-B \| ViT-L \|
	\|-------------\|:-----:\|:-----:\|
	\| DinoV2 teacher encoder & decoder weights + ExPLoRA adapters \| [ViT-B/14](https://huggingface.co/samarkhanna/ExPLoRA/resolve/main/explora_dinov2_fmow_rgb/explora_dinov2_vit_base_fmow_rgb.pth) \| [ViT-L/14](https://huggingface.co/samarkhanna/ExPLoRA/blob/main/explora_dinov2_fmow_rgb/explora_dinov2_vit_large_fmow_rgb.pth) \|
	\| Encoder-only weights \| [ViT-B/14](https://huggingface.co/samarkhanna/ExPLoRA/resolve/main/explora_dinov2_fmow_rgb/explora_dinov2_vit_base_fmow_rgb_encoder_only.pth) \| [ViT-L/14](https://huggingface.co/samarkhanna/ExPLoRA/resolve/main/explora_dinov2_fmow_rgb/explora_dinov2_vit_large_fmow_rgb_encoder_only.pth) \|

	Usage:
	```python
	import torch

	# Load encoder-only checkpoint (recommended for fine-tuning)
	ckpt = torch.load("explora_dinov2_fmow_rgb/explora_dinov2_vit_large_fmow_rgb_encoder_only.pth", map_location="cpu")
	state_dict = ckpt["model"]
	```

	### `explora_mae_multispectral/`

	ExPLoRA checkpoints using MAE self-supervised pre-training on fMoW Sentinel-2 multispectral imagery.

	\| Description \| ViT-L \|
	\|-------------\|:-----:\|
	\| MAE encoder & decoder weights + ExPLoRA adapters \| [ViT-L/16](https://huggingface.co/samarkhanna/ExPLoRA/resolve/main/explora_mae_multispectral/explora_mae_fmow_sentinel.pth) \|
	\| Encoder-only weights \| [ViT-L/16](https://huggingface.co/samarkhanna/ExPLoRA/resolve/main/explora_mae_multispectral/explora_mae_fmow_sentinel_encoder_only.pth) \|

	Usage:
	```python
	import torch

	# Load encoder-only checkpoint (recommended for fine-tuning)
	ckpt = torch.load("explora_mae_multispectral/explora_mae_fmow_sentinel_encoder_only.pth", map_location="cpu")
	state_dict = ckpt["model"]
	```

	---

	## Loading Checkpoints

	These checkpoints are compatible with the [ExPLoRA codebase](https://github.com/samar-khanna/ExPLoRA).

	For fine-tuning, use the `finetune/finetune.py` script:
	```bash
	python finetune/finetune.py \
	--finetune path/to/explora_checkpoint.pth \
	--model vit_large_patch16 \
	--dataset_type rgb \
	...
	```

	Reference scripts are also provided under `scripts/` in the codebase, and you can use these checkpoints there.

	---

	## Citation

	If you find these checkpoints useful, please cite our paper:

	```bibtex
	@inproceedings{khanna2025explora,
	title={Ex{PL}o{RA}: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts},
	author={Samar Khanna and Medhanie Irgau and David B. Lobell and Stefano Ermon},
	booktitle={Forty-second International Conference on Machine Learning},
	year={2025},
	url={https://openreview.net/forum?id=OtxLhobhwb}
	}
	```

	## License

	Apache 2.0