Bidirectional Cross-Attention Fusion of High-Res RGB and Low-Res HSI for Multimodal Automated Waste Sorting
Paper • 2603.13941 • Published
Model weights for:
"Bidirectional Cross-Attention Fusion of High-Res RGB and Low-Res HSI for Multimodal Automated Waste Sorting"
Jonas V. Funk, Lukas Roming, Andreas Michel, Paul Bäcker, Georg Maier, Thomas Längle, Markus Klute
We present Bidirectional Cross-Attention Fusion (BCAF), which aligns high-resolution RGB with low-resolution HSI at their native grids via localized, bidirectional cross-attention, avoiding spatial pre-upsampling or early spectral collapse.
See instructions on GitHub.
| File | mIoU ↑ | Img./s ↑ |
|---|---|---|
BCAF_SpectralWaste_rgb1024_hsi5_best.pth |
76.4 ± 0.4% | 31 |
BCAF_SpectralWaste_rgb512_hsi5_best.pth |
75.4 ± 0.2% | 55 |
BCAF_SpectralWaste_rgb256_hsi5_best.pth |
71.1 ± 0.4% | 54 |
logitfusion_SpectralWaste_rgb1024_hsi5_best.pth |
72.6 ± 0.8% | 39 |
| File | mIoU ↑ | Img./s ↑ |
|---|---|---|
swin_t_SpectralWaste_rgb_256_best.pth |
65.8 ± 1.2% | 141 |
swin_t_SpectralWaste_rgb_512_best.pth |
71.1 ± 0.6% | 135 |
swin_t_SpectralWaste_rgb_1024_best.pth |
71.6 ± 0.3% | 60 |
swin_t_SpectralWaste_rgb_2048_best.pth |
68.4 ± 0.8% | 15 |
| File | mIoU ↑ | Img./s ↑ |
|---|---|---|
swin_t_SpectralWaste_hsi_1_best.pth |
60.9 ± 0.2% | 141 |
adapted_swin_t_SpectralWaste_hsi_3_best.pth |
59.7 ± 0.7% | 114 |
adapted_swin_t_SpectralWaste_hsi_5_best.pth |
60.3 ± 0.9% | 119 |
adapted_swin_t_SpectralWaste_hsi_7_best.pth |
59.0 ± 1.5% | 91 |
adapted_swin_t_SpectralWaste_hsi_10_best.pth |
57.8 ± 1.2% | 68 |
@article{funk2026bcaf,
title={Bidirectional Cross-Attention Fusion of High-Res RGB and Low-Res HSI
for Multimodal Automated Waste Sorting},
author={Jonas V. Funk and Lukas Roming and Andreas Michel and Paul B{\"a}cker
and Georg Maier and Thomas L{\"a}ngle and Markus Klute},
year={2026},
eprint={2603.13941},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.13941}
}