drexalt's picture
Update README.md
6188f69 verified
---
library_name: transformers
license: mit
datasets:
- BeIR/msmarco
language:
- en
base_model:
- chandar-lab/NeoBERT
---
# Model Card for NeoBERT-RetroMAE-pretrain
This model is an equivalent to [shitao/RetroMAE_MSMARCO](https://huggingface.co/Shitao/RetroMAE_MSMARCO) but trained on the NeoBERT architecture. The training objective
was a LexMAE style train incorporating the Bag-of-Words loss from DupMAE, making a sort of LexMAE/DupMAE hybrid. This model only underwent masked language modeling pretraining and did not see any contrastive training, so the model will need to be trained on a downstream task to achieve useful performance. This model was trained with SPLADE training in mind but may be appropriate for dense embedding or ColBERT downstream training also.
## How to Get Started with the Model
Please reference the original NeoBERT model card as well: ["chandar-lab/NeoBERT"](https://huggingface.co/chandar-lab/NeoBERT)
Ensure you have the following dependencies installed:
```bash
pip install transformers torch xformers==0.0.28.post3
```
If you would like to use sequence packing (un-padding), you will need to also install flash-attention:
```bash
pip install transformers torch xformers==0.0.28.post3 flash_attn
```
Use the code below to get started with the model.
```
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("drexalt/NeoBERT-RetroMAE-pretrain", trust_remote_code=True)
```
## Evaluation
NeoBERT-RetroMAE-pretrain and Shitao/RetroMAE_MSMARCO were trained on the MSMARCO collection, so the NanoBEIR MsMARCO results can be seen as in-distribution and the other subsets as OOD.
Both models were evaluated with a maximum sequence length of 512 tokens and went through a SPLADE activation before selecting the top 512 tokens. Results are similar across top_k values. This method of evaluation was mentioned in the LexMAE paper for choosing a pretrain checkpoint, but it is not present in the LexMAE codebase. A different manner of evaluation may be more appropriate for dense downstream models.
### Evaluation on NanoBEIR:
| Model | MsMARCO nDCG@10 | MsMARCO MRR@10 | SciFact nDCG@10 | ClimateFEVER nDCG@10 | NFCorpus nDCG@10 |
|:--------------------------|:----------------|:---------------|:----------------|:---------------------|:-----------------|
| NeoBERT-RetroMAE-pretrain | **0.3210** | **0.2415** | **0.3648** | **0.1378** | 0.0747 |
| Shitao/RetroMAE_MSMARCO | 0.1980 | 0.1374 | 0.3236 | 0.0942 | **0.1150** |