--- library_name: transformers license: mit datasets: - BeIR/msmarco language: - en base_model: - chandar-lab/NeoBERT --- # Model Card for NeoBERT-RetroMAE-pretrain This model is an equivalent to [shitao/RetroMAE_MSMARCO](https://huggingface.co/Shitao/RetroMAE_MSMARCO) but trained on the NeoBERT architecture. The training objective was a LexMAE style train incorporating the Bag-of-Words loss from DupMAE, making a sort of LexMAE/DupMAE hybrid. This model only underwent masked language modeling pretraining and did not see any contrastive training, so the model will need to be trained on a downstream task to achieve useful performance. This model was trained with SPLADE training in mind but may be appropriate for dense embedding or ColBERT downstream training also. ## How to Get Started with the Model Please reference the original NeoBERT model card as well: ["chandar-lab/NeoBERT"](https://huggingface.co/chandar-lab/NeoBERT) Ensure you have the following dependencies installed: ```bash pip install transformers torch xformers==0.0.28.post3 ``` If you would like to use sequence packing (un-padding), you will need to also install flash-attention: ```bash pip install transformers torch xformers==0.0.28.post3 flash_attn ``` Use the code below to get started with the model. ``` from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") model = AutoModelForMaskedLM.from_pretrained("drexalt/NeoBERT-RetroMAE-pretrain", trust_remote_code=True) ``` ## Evaluation NeoBERT-RetroMAE-pretrain and Shitao/RetroMAE_MSMARCO were trained on the MSMARCO collection, so the NanoBEIR MsMARCO results can be seen as in-distribution and the other subsets as OOD. Both models were evaluated with a maximum sequence length of 512 tokens and went through a SPLADE activation before selecting the top 512 tokens. Results are similar across top_k values. This method of evaluation was mentioned in the LexMAE paper for choosing a pretrain checkpoint, but it is not present in the LexMAE codebase. A different manner of evaluation may be more appropriate for dense downstream models. ### Evaluation on NanoBEIR: | Model | MsMARCO nDCG@10 | MsMARCO MRR@10 | SciFact nDCG@10 | ClimateFEVER nDCG@10 | NFCorpus nDCG@10 | |:--------------------------|:----------------|:---------------|:----------------|:---------------------|:-----------------| | NeoBERT-RetroMAE-pretrain | **0.3210** | **0.2415** | **0.3648** | **0.1378** | 0.0747 | | Shitao/RetroMAE_MSMARCO | 0.1980 | 0.1374 | 0.3236 | 0.0942 | **0.1150** |