| | --- |
| | library_name: transformers |
| | license: mit |
| | datasets: |
| | - BeIR/msmarco |
| | language: |
| | - en |
| | base_model: |
| | - chandar-lab/NeoBERT |
| | --- |
| | |
| | # Model Card for NeoBERT-RetroMAE-pretrain |
| |
|
| | This model is an equivalent to [shitao/RetroMAE_MSMARCO](https://huggingface.co/Shitao/RetroMAE_MSMARCO) but trained on the NeoBERT architecture. The training objective |
| | was a LexMAE style train incorporating the Bag-of-Words loss from DupMAE, making a sort of LexMAE/DupMAE hybrid. This model only underwent masked language modeling pretraining and did not see any contrastive training, so the model will need to be trained on a downstream task to achieve useful performance. This model was trained with SPLADE training in mind but may be appropriate for dense embedding or ColBERT downstream training also. |
| |
|
| |
|
| |
|
| | ## How to Get Started with the Model |
| | Please reference the original NeoBERT model card as well: ["chandar-lab/NeoBERT"](https://huggingface.co/chandar-lab/NeoBERT) |
| |
|
| |
|
| | Ensure you have the following dependencies installed: |
| |
|
| | ```bash |
| | pip install transformers torch xformers==0.0.28.post3 |
| | ``` |
| |
|
| | If you would like to use sequence packing (un-padding), you will need to also install flash-attention: |
| |
|
| | ```bash |
| | pip install transformers torch xformers==0.0.28.post3 flash_attn |
| | ``` |
| |
|
| | Use the code below to get started with the model. |
| |
|
| |
|
| |
|
| | ``` |
| | from transformers import AutoTokenizer, AutoModelForMaskedLM |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") |
| | model = AutoModelForMaskedLM.from_pretrained("drexalt/NeoBERT-RetroMAE-pretrain", trust_remote_code=True) |
| | ``` |
| |
|
| |
|
| | ## Evaluation |
| |
|
| | NeoBERT-RetroMAE-pretrain and Shitao/RetroMAE_MSMARCO were trained on the MSMARCO collection, so the NanoBEIR MsMARCO results can be seen as in-distribution and the other subsets as OOD. |
| | |
| | Both models were evaluated with a maximum sequence length of 512 tokens and went through a SPLADE activation before selecting the top 512 tokens. Results are similar across top_k values. This method of evaluation was mentioned in the LexMAE paper for choosing a pretrain checkpoint, but it is not present in the LexMAE codebase. A different manner of evaluation may be more appropriate for dense downstream models. |
| |
|
| |
|
| | ### Evaluation on NanoBEIR: |
| |
|
| | | Model | MsMARCO nDCG@10 | MsMARCO MRR@10 | SciFact nDCG@10 | ClimateFEVER nDCG@10 | NFCorpus nDCG@10 | |
| | |:--------------------------|:----------------|:---------------|:----------------|:---------------------|:-----------------| |
| | | NeoBERT-RetroMAE-pretrain | **0.3210** | **0.2415** | **0.3648** | **0.1378** | 0.0747 | |
| | | Shitao/RetroMAE_MSMARCO | 0.1980 | 0.1374 | 0.3236 | 0.0942 | **0.1150** | |
| | |
| | |
| | |