SSync / README.md
WJ0830's picture
Add image-segmentation pipeline tag to metadata (#1)
2883d44
|
Raw
History Blame Contribute Delete
2.69 kB
metadata
datasets:
  - movi-c
  - movi-e
  - youtube-vis-2021
  - coco
language:
  - en
library_name: pytorch
license: mit
pipeline_tag: image-segmentation
metrics:
  - fg-ari
  - mbo
tags:
  - video-object-centric-learning
  - object-discovery
  - slot-attention
  - unsupervised-segmentation
  - video-understanding
  - pytorch

SSync: Selective Synergistic Learning for Video Object-Centric Learning

ECCV 2026 · Paper · Code · Project Page

Authors: WonJun Moon (KAIST), Jae-Pil Heo (Sungkyunkwan University)

Model Description

SSync is a selective mutual-distillation framework for video object-centric learning (VOCL). Slot-based VOCL methods are guided by two spatial maps — the encoder's attention map (sharp boundaries, noisy interiors) and the decoder's object map (coherent interiors, blurry boundaries). Rather than forcing dense agreement across all spatio-temporal patches, SSync selectively distills only the most reliable cues from each map:

  • Encoder → Decoder: boundary refinement via crisp attention boundaries
  • Decoder → Encoder: interior denoising via coherent object maps

This is realized through a linear-complexity pseudo-labeling scheme, eliminating quadratic spatial comparisons. A transitive pseudo-label merging step further consolidates redundant slots based on spatio-temporal activation consistency, making SSync robust to slot count configurations.

Evaluation Results

Object discovery on VOCL benchmarks (averaged over 3 runs):

Dataset FG-ARI ↑ mBO ↑
MOVi-C (336×336) 79.4 39.5
MOVi-E (336×336) 84.0 34.8
YouTube-VIS 2021 (518×518) 42.6 38.7

Training Data

Dataset Size
YouTube-VIS 2021 26.43 GB
MOVi-C 7.43 GB
MOVi-E 8.26 GB

See data/README.md for download instructions.

Citation

@inproceedings{moon2026ssync,
  title     = {Selective Synergistic Learning for Video Object-Centric Learning},
  author    = {Moon, WonJun and Heo, Jae-Pil},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}

Acknowledgements

Built upon VideoSAUR, SlotContrast, SRL, and SlotCurri.