---
datasets:
- movi-c
- movi-e
- youtube-vis-2021
- coco
language:
- en
library_name: pytorch
license: mit
pipeline_tag: image-segmentation
metrics:
- fg-ari
- mbo
tags:
- video-object-centric-learning
- object-discovery
- slot-attention
- unsupervised-segmentation
- video-understanding
- pytorch
---
SSync: Selective Synergistic Learning for Video Object-Centric Learning
**ECCV 2026** · [Paper](https://arxiv.org/abs/2606.15527v1) · [Code](https://github.com/wjun0830/SSync) · [Project Page](https://wjun0830.github.io/SSync)
**Authors:** [WonJun Moon](https://github.com/wjun0830) (KAIST), Jae-Pil Heo (Sungkyunkwan University)
## Model Description
SSync is a selective mutual-distillation framework for video object-centric learning (VOCL). Slot-based VOCL methods are guided by two spatial maps — the encoder's **attention map** (sharp boundaries, noisy interiors) and the decoder's **object map** (coherent interiors, blurry boundaries). Rather than forcing dense agreement across all spatio-temporal patches, SSync selectively distills only the most reliable cues from each map:
- **Encoder → Decoder:** boundary refinement via crisp attention boundaries
- **Decoder → Encoder:** interior denoising via coherent object maps
This is realized through a **linear-complexity pseudo-labeling** scheme, eliminating quadratic spatial comparisons. A **transitive pseudo-label merging** step further consolidates redundant slots based on spatio-temporal activation consistency, making SSync robust to slot count configurations.
## Evaluation Results
Object discovery on VOCL benchmarks (averaged over 3 runs):
| Dataset | FG-ARI ↑ | mBO ↑ |
|---|---|---|
| MOVi-C (336×336) | **79.4** | **39.5** |
| MOVi-E (336×336) | **84.0** | 34.8 |
| YouTube-VIS 2021 (518×518) | 42.6 | **38.7** |
## Training Data
| Dataset | Size |
|---|---|
| YouTube-VIS 2021 | 26.43 GB |
| MOVi-C | 7.43 GB |
| MOVi-E | 8.26 GB |
See [data/README.md](https://github.com/wjun0830/SSync/blob/main/data/README.md) for download instructions.
## Citation
```bibtex
@inproceedings{moon2026ssync,
title = {Selective Synergistic Learning for Video Object-Centric Learning},
author = {Moon, WonJun and Heo, Jae-Pil},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2026}
}
```
## Acknowledgements
Built upon [VideoSAUR](https://github.com/martius-lab/videosaur), [SlotContrast](https://github.com/martius-lab/slotcontrast), [SRL](https://github.com/hynnsk/SRL), and [SlotCurri](https://github.com/wjun0830/SlotCurri).