You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Monarch Mixer-BERT

The 80M checkpoint for M2-BERT-base from the paper Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.

Check out our GitHub for instructions on how to download and fine-tune it!

How to use

You can load this model using Hugging Face AutoModel:

from transformers import AutoModelForMaskedLM
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-80M', trust_remote_code=True)

This model uses the Hugging Face bert-base-uncased tokenizer:

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

You can use this model with a pipeline for masked language modeling:

from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-80M', trust_remote_code=True)

unmasker = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
unmasker('Every morning, I enjoy a cup of [MASK] to start my day.')

Remote Code

This model requires trust_remote_code=True to be passed to the from_pretrained method. This is because we use custom PyTorch code (see our GitHub). You should consider passing a revision argument that specifies the exact git commit of the code, for example:

mlm = AutoModelForMaskedLM.from_pretrained(
   'alycialee/m2-bert-80M',
   trust_remote_code=True,
   revision='d8a0938',
)

Configuration

Note use_flash_mm is false by default. Using FlashMM is currently not supported.

Downloads last month: -

Paper for alycialee/m2-bert-80M

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Paper • 2310.12109 • Published Oct 18, 2023 • 1