don't reproduce QuoraRetrieval NDCG@10 score.

#14

by jcli0606 - opened Jul 17, 2024

Jul 17, 2024

thanks.
I want to reproduce to mteb/retrieval for QuoraRetrieval. but I get an NDCG@10 score of 80.73.
I confirm that query embedding have prompt，and doc don't have prompt。

Other dataset's NDCG@10 score can reproduce. For example SCIDOCS,ArguAna,etc.

lukemerrick

Jul 17, 2024

•

edited Jul 17, 2024

QuoraRetrieval is a duplicate question retrieval task, i.e. matching queries to other queries instead of queries to documents. As such, we follow the common practice of using the query prefix for both queries and documents when embedding this dataset (this was not our brilliant idea by any means, it goes back to the E5 paper at least -- see their Appendix B).

I do not believe this was properly documented anywhere, though, even in our tech report. My apologies for the oversight!

lukemerrick

Jul 17, 2024

You should see if this symmetrical embedding improves your organization's Stella models' scores on QuoraRetrieval, too, if you haven't yet!

(And good luck with the write-up for that one -- we're looking forward to reading when it's ready!)

jcli0606

Jul 18, 2024

•

edited Jul 18, 2024

Thanks ,got it.

jcli0606 changed discussion status to closed Jul 18, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment