---
library_name: transformers
license: mit
datasets:
- BeIR/msmarco
language:
- en
base_model:
- chandar-lab/NeoBERT
---

# Model Card for NeoBERT-RetroMAE-pretrain

This model is an equivalent to [shitao/RetroMAE_MSMARCO](https://huggingface.co/Shitao/RetroMAE_MSMARCO) but trained on the NeoBERT architecture. The training objective
was a LexMAE style train incorporating the Bag-of-Words loss from DupMAE, making a sort of LexMAE/DupMAE hybrid. This model only underwent masked language modeling pretraining and did not see any contrastive training, so the model will need to be trained on a downstream task to achieve useful performance. This model was trained with SPLADE training in mind but may be appropriate for dense embedding or ColBERT downstream training also. 


## How to Get Started with the Model
Please reference the original NeoBERT model card as well: ["chandar-lab/NeoBERT"](https://huggingface.co/chandar-lab/NeoBERT)


Ensure you have the following dependencies installed:

```bash
pip install transformers torch xformers==0.0.28.post3
```

If you would like to use sequence packing (un-padding), you will need to also install flash-attention:

```bash
pip install transformers torch xformers==0.0.28.post3 flash_attn
```

Use the code below to get started with the model.


```
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("drexalt/NeoBERT-RetroMAE-pretrain", trust_remote_code=True)
```


## Evaluation

NeoBERT-RetroMAE-pretrain and Shitao/RetroMAE_MSMARCO were trained on the MSMARCO collection, so the NanoBEIR MsMARCO results can be seen as in-distribution and the other subsets as OOD.

Both models were evaluated with a maximum sequence length of 512 tokens and went through a SPLADE activation before selecting the top 512 tokens. Results are similar across top_k values. This method of evaluation was mentioned in the LexMAE paper for choosing a pretrain checkpoint, but it is not present in the LexMAE codebase. A different manner of evaluation may be more appropriate for dense downstream models. 


### Evaluation on NanoBEIR:

| Model                     | MsMARCO nDCG@10 | MsMARCO MRR@10 | SciFact nDCG@10 | ClimateFEVER nDCG@10 | NFCorpus nDCG@10 |
|:--------------------------|:----------------|:---------------|:----------------|:---------------------|:-----------------|
| NeoBERT-RetroMAE-pretrain | **0.3210**      | **0.2415**     | **0.3648**      | **0.1378**           | 0.0747           |
| Shitao/RetroMAE_MSMARCO   | 0.1980          | 0.1374         | 0.3236          | 0.0942               | **0.1150**       |