BidirLM-270M-Base

BidirLM-270M-Base is the intermediate MNTP-adapted checkpoint of the BidirLM family. It is obtained by converting Gemma3-270M from causal to bidirectional attention and training with Masked Next Token Prediction (MNTP) on 30B tokens from a multi-domain corpus (FineWeb-Edu, FineWeb2-HQ, FineMath, Stack V2), then merged 50/50 with the original Gemma3-270M weights.

For general embeddings and downstream fine-tuning, use BidirLM/BidirLM-270M-Embedding which adds contrastive training on top of this checkpoint.

Usage

from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("BidirLM/BidirLM-270M-Base", trust_remote_code=True)

# Base encoder
model = AutoModel.from_pretrained("BidirLM/BidirLM-270M-Base", trust_remote_code=True)

# Masked language model
mlm = AutoModelForMaskedLM.from_pretrained("BidirLM/BidirLM-270M-Base", trust_remote_code=True)

Requirements

transformers>=4.57.6,<5.0.0

This model requires trust_remote_code=True.

Citation

@misc{boizard2026bidirlmtextomnimodalbidirectional,
      title={BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs}, 
      author={Nicolas Boizard and Théo Deschamps-Berger and Hippolyte Gisserot-Boukhlef and Céline Hudelot and Pierre Colombo},
      year={2026},
      eprint={2604.02045},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.02045}, 
}
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BidirLM/BidirLM-270M-Base

Finetuned
(137)
this model
Finetunes
1 model

Collection including BidirLM/BidirLM-270M-Base

Paper for BidirLM/BidirLM-270M-Base