monoelectra-base / README.md

ronak

Create README.md

1b0ca67 verified 4 months ago

preview code

raw

history blame contribute delete

1.55 kB

metadata

language:
  - en
base_model:
  - google/electra-base-discriminator
pipeline_tag: text-ranking

monoELECTRA is a highly effective cross-encoder reranker built on google/electra-base-discriminator and trained on MS MARCO passage data for 300K steps with a batch size of 16. It uses hard negatives from strong first-stage retrievers and the Localized Contrastive Estimation (LCE) loss with large group sizes (up to 31 negatives per positive). This setup consistently outperforms standard monoBERT and hinge/CE baselines, especially in the top-$k$ pool where near-duplicate passages matter. If you want a compact, supervised reranker that was tuned to squeeze every last bit of signal from hard negatives, use this one.

If you use the monoELECTRA model, please cite the following relevant paper:

Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking

@inproceedings{squeezemonoelectra2022,
author = {Pradeep, Ronak and Liu, Yuqi and Zhang, Xinyu and Li, Yilin and Yates, Andrew and Lin, Jimmy},
title = {Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking},
year = {2022},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
booktitle = {Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I},
pages = {655–670},
numpages = {16},
location = {Stavanger, Norway}
}