BidirLM-270M-Base

BidirLM-270M-Base is the intermediate MNTP-adapted checkpoint of the BidirLM family. It is obtained by converting Gemma3-270M from causal to bidirectional attention and training with Masked Next Token Prediction (MNTP) on 30B tokens from a multi-domain corpus (FineWeb-Edu, FineWeb2-HQ, FineMath, Stack V2), then merged 50/50 with the original Gemma3-270M weights.

For general embeddings and downstream fine-tuning, use BidirLM/BidirLM-270M-Embedding which adds contrastive training on top of this checkpoint.

Usage

from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("BidirLM/BidirLM-270M-Base", trust_remote_code=True)

# Base encoder
model = AutoModel.from_pretrained("BidirLM/BidirLM-270M-Base", trust_remote_code=True)

# Masked language model
mlm = AutoModelForMaskedLM.from_pretrained("BidirLM/BidirLM-270M-Base", trust_remote_code=True)

Requirements

transformers>=5.0

This model requires trust_remote_code=True.

Note: This model was trained with transformers==4.57.6 (transformers 4.x). The version on main was patched to work with transformers>=5.0. For the original (pre-patch) version, which is compatible with transformers>=4.57.6,<5.0.0, use the transformers-v4 branch:
from transformers import AutoModel
model = AutoModel.from_pretrained(
    "BidirLM/BidirLM-270M-Base",
    trust_remote_code=True,
    revision="transformers-v4",
)

Citation

@misc{boizard2026bidirlmtextomnimodalbidirectional,
      title={BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs}, 
      author={Nicolas Boizard and Théo Deschamps-Berger and Hippolyte Gisserot-Boukhlef and Céline Hudelot and Pierre Colombo},
      year={2026},
      eprint={2604.02045},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.02045}, 
}