owlv2 / scenic /projects /loca /README.md
fcxfcx's picture
Upload 2446 files
1327f34 verified

LOCA: Location-Aware Self-Supervised Vision Transformers for Semantic Segmentation

JAX implementation and pretrained models for LOCA. For details, see arXiv.

Training

Like other projects in Scenic, all model parameters, training sets and datasets are specified using configuration files.

An example command-line to train ViT-Base/16 on the ImageNet-1k dataset during 100 epochs using this config file is:

$ python -m scenic.projects.loca.main \
  --config=scenic/projects/loca/configs/loca_imnet1k_base16.py \
  --workdir=loca_base/

The resulting checkpoint should reach 46.2 mIoU after finetuning on ADE20k dataset with the linear decoder from Segmenter.

Model Zoo

arch data mIoU ADE20k download
ViT-S/16 ImageNet-1k 44.8 checkpoint
ViT-B/16 ImageNet-1k 48.0 checkpoint
ViT-B/16 ImageNet-21k 48.5 checkpoint
ViT-L/16 ImageNet-21k 52.3 checkpoint
ViT-H/16 ImageNet-21k 54.3 checkpoint

Citation

If you use LOCA, please use the following BibTeX entry.

@article{caron2022location,
    title={Location-Aware Self-Supervised Vision Transformers for Semantic Segmentation},
    author={Caron, Mathilde and Houlsby, Neil and Schmid, Cordelia},
    journal={arXiv:2212.02400},
    year={2022}
}