SSync / README.md

Add image-segmentation pipeline tag to metadata (#1)

2883d44 12 days ago

2.69 kB

datasets:
  - movi-c
  - movi-e
  - youtube-vis-2021
  - coco
language:
  - en
library_name: pytorch
license: mit
pipeline_tag: image-segmentation
metrics:
  - fg-ari
  - mbo
tags:
  - video-object-centric-learning
  - object-discovery
  - slot-attention
  - unsupervised-segmentation
  - video-understanding
  - pytorch

SSync: Selective Synergistic Learning for Video Object-Centric Learning

ECCV 2026 · Paper · Code · Project Page

Authors: WonJun Moon (KAIST), Jae-Pil Heo (Sungkyunkwan University)

Model Description

SSync is a selective mutual-distillation framework for video object-centric learning (VOCL). Slot-based VOCL methods are guided by two spatial maps — the encoder's attention map (sharp boundaries, noisy interiors) and the decoder's object map (coherent interiors, blurry boundaries). Rather than forcing dense agreement across all spatio-temporal patches, SSync selectively distills only the most reliable cues from each map:

Encoder → Decoder: boundary refinement via crisp attention boundaries
Decoder → Encoder: interior denoising via coherent object maps

This is realized through a linear-complexity pseudo-labeling scheme, eliminating quadratic spatial comparisons. A transitive pseudo-label merging step further consolidates redundant slots based on spatio-temporal activation consistency, making SSync robust to slot count configurations.

Evaluation Results

Object discovery on VOCL benchmarks (averaged over 3 runs):

Dataset	FG-ARI ↑	mBO ↑
MOVi-C (336×336)	79.4	39.5
MOVi-E (336×336)	84.0	34.8
YouTube-VIS 2021 (518×518)	42.6	38.7

Training Data

Dataset	Size
YouTube-VIS 2021	26.43 GB
MOVi-C	7.43 GB
MOVi-E	8.26 GB

See data/README.md for download instructions.

Citation

@inproceedings{moon2026ssync,
  title     = {Selective Synergistic Learning for Video Object-Centric Learning},
  author    = {Moon, WonJun and Heo, Jae-Pil},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}

Acknowledgements

Built upon VideoSAUR, SlotContrast, SRL, and SlotCurri.