BidirLM-Base (Dev only)
Collection
This collection provides exclusively mask-trained BidirLM models intended strictly for downstream tasks or continuous pretraining. • 5 items • Updated
BidirLM-0.6B-Base is the intermediate MNTP-adapted checkpoint of the BidirLM family. It is obtained by converting Qwen3-0.6B-Base from causal to bidirectional attention and training with Masked Next Token Prediction (MNTP) on 30B tokens from a multi-domain corpus (FineWeb-Edu, FineWeb2-HQ, FineMath, Stack V2), then merged 50/50 with the original Qwen3-0.6B-Base weights.
For general embeddings and downstream fine-tuning, use BidirLM/BidirLM-0.6B-Embedding which adds contrastive training on top of this checkpoint.
from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("BidirLM/BidirLM-0.6B-Base", trust_remote_code=True)
# Base encoder
model = AutoModel.from_pretrained("BidirLM/BidirLM-0.6B-Base", trust_remote_code=True)
# Masked language model
mlm = AutoModelForMaskedLM.from_pretrained("BidirLM/BidirLM-0.6B-Base", trust_remote_code=True)
transformers>=4.57.6,<5.0.0
This model requires trust_remote_code=True.
@misc{boizard2026bidirlmtextomnimodalbidirectional,
title={BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs},
author={Nicolas Boizard and Théo Deschamps-Berger and Hippolyte Gisserot-Boukhlef and Céline Hudelot and Pierre Colombo},
year={2026},
eprint={2604.02045},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.02045},
}