Monarch Mixer-BERT
The 80M checkpoint for M2-BERT-base from the paper Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.
Check out our GitHub for instructions on how to download and fine-tune it!
How to use
You can load this model using Hugging Face AutoModel:
from transformers import AutoModelForMaskedLM
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-80M', trust_remote_code=True)
This model uses the Hugging Face bert-base-uncased tokenizer:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
You can use this model with a pipeline for masked language modeling:
from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-80M', trust_remote_code=True)
unmasker = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
unmasker('Every morning, I enjoy a cup of [MASK] to start my day.')
Remote Code
This model requires trust_remote_code=True to be passed to the from_pretrained method. This is because we use custom PyTorch code (see our GitHub). You should consider passing a revision argument that specifies the exact git commit of the code, for example:
mlm = AutoModelForMaskedLM.from_pretrained(
'alycialee/m2-bert-80M',
trust_remote_code=True,
revision='d8a0938',
)
Configuration
Note use_flash_mm is false by default. Using FlashMM is currently not supported.
- Downloads last month
- -
Paper for alycialee/m2-bert-80M
Paper
•
2310.12109
•
Published
•
1