Observed a fun fact: English-only models tend to get worse performance on closed datasets, while multilingual models are better at closed dataset.
Is it because the baseline, MTEB, is a benchmark merely in English, and RTEB is multilingual? It's natural that multilingual model get better performance on multilingual benchmark instead of English.