--- tags: - mteb - sentence-transformers - transformers - embedding - bidirectional - multilingual pipeline_tag: sentence-similarity license: apache-2.0 base_model: BidirLM/BidirLM-1B-Base language: - multilingual - af - am - ar - az - be - bg - bn - bs - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - ga - gl - gu - ha - he - hi - hr - ht - hu - hy - id - ig - is - it - ja - jv - ka - kk - kn - ko - ky - lt - lv - mg - mk - ml - mr - ms - mt - my - nb - ne - nl - nso - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sn - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - uk - ur - vi - wo - xh - yo - zh - zu --- # BidirLM-1B BidirLM is a family of 5 frontier bidirectional encoders, including an omnimodal variant at 2.5B, adapted from causal decoder LLMs. Contrary to contrastive-only models, BidirLM relies on a prior masking phase (MNTP) that enables state-of-the-art results on task-specific fine-tuning (NER, classification, NLI) while achieving frontier performance on embedding benchmarks (MTEB) against open-source alternatives. ![Multilingual model performance by size on XTREME-Benchmark Augmented and MTEB Multilingual V2](final_results.png) | Model | Base LLM | Parameters | Embedding Dim | Max Tokens | MTEB Multi. V2 (Mean Task) | |---|---|---|---|---|---| | BidirLM-270M | Gemma3-270M | 268M | 640 | 512 | 55.5 | | BidirLM-0.6B | Qwen3-0.6B | 596M | 1024 | 512 | 59.6 | | **BidirLM-1B** | **Gemma3-1B** | **1001M** | **1152** | **512** (\*) | **62.1** | | BidirLM-1.7B | Qwen3-1.7B | 1721M | 2048 | 512 | 62.9 | | BidirLM-Omni-2.5B | Qwen3-1.7B | 2.5B | 2048 | 512 | 63.1 | (\*) While evaluated on MTEB with a max length of 512, the underlying architecture supports up to 32,768 context length (Gemma3). Longer sequences can be used by adjusting `model.max_seq_length` in Sentence Transformers or `max_length` in the tokenizer. ## Supported Tasks **General embeddings** (via Sentence Transformers): retrieval, semantic similarity (STS), clustering, classification, pair classification, reranking, bitext mining, multilabel classification **Downstream fine-tuning** (via Transformers): sequence classification (e.g. MNLI, XNLI, PAWS-X, MathShepherd), token classification (e.g. PAN-X, POS), information retrieval (e.g. MIRACL, CodeSearchNet), sequence regression (e.g. Seahorse) ## Usage ### Sentence Transformers Use Sentence Transformers to compute embeddings for any text representation task. ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("BidirLM/BidirLM-1B", trust_remote_code=True) queries = [ "What is the capital of France?", "How does photosynthesis work?", ] documents = [ "Paris is the capital and largest city of France, situated on the river Seine.", "Photosynthesis is the process by which plants convert sunlight, water, and CO2 into glucose and oxygen.", ] query_embeddings = model.encode(queries) document_embeddings = model.encode(documents) similarities = model.similarity(query_embeddings, document_embeddings) print(similarities) ``` ### Fine-tuning for Downstream Tasks BidirLM can be directly fine-tuned for downstream tasks: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("BidirLM/BidirLM-1B", trust_remote_code=True) # Sequence classification (e.g., NLI: entailment, neutral, contradiction) seq_model = AutoModelForSequenceClassification.from_pretrained( "BidirLM/BidirLM-1B", trust_remote_code=True, num_labels=3, ) # Token classification (e.g., NER) tok_model = AutoModelForTokenClassification.from_pretrained( "BidirLM/BidirLM-1B", trust_remote_code=True, num_labels=7, ) # Fine-tune with HuggingFace Trainer ``` ## Evaluation Please follow the [mteb repository](https://github.com/embeddings-benchmark/mteb) on how to reproduce our scores. The evaluation prompts used for each task are also available at [mteb_v2_eval_prompts.json](mteb_v2_eval_prompts.json). ## Supported Languages Multilingual support across over 140 languages, inherited from the Gemma3 base model and reinforced through contrastive training with 87 languages. ## Requirements This model requires `trust_remote_code=True` as it uses a custom bidirectional architecture. ``` transformers>=4.57.6,<5.0.0 sentence-transformers>=5.0.0 ``` ## FAQ ### 1. What pooling strategy does this model use? The model uses **mean pooling**. This is handled automatically when using Sentence Transformers. ### 2. Do I need `trust_remote_code=True`? Yes. BidirLM uses a custom bidirectional architecture (`BidirLMModel`) that requires loading custom code from the repository. ### 3. Why are my reproduced results slightly different from those reported in the model card? Different versions of `transformers` and `pytorch` could cause negligible but non-zero performance differences. This model was trained and evaluated with `transformers==4.57.6` and `pytorch==2.6.0`. ### 4. What is the relationship between BidirLM-1B and BidirLM-1B-Base? [BidirLM/BidirLM-1B-Base](https://huggingface.co/BidirLM/BidirLM-1B-Base) is the intermediate MNTP-adapted checkpoint (bidirectional pretraining stage). BidirLM-1B is the final contrastive fine-tuned version optimized for both sentence embeddings and downstream fine-tuning. ### 5. How is BidirLM different from other embedding models? Most embedding models (BGE-M3, KaLM, EmbedGemma, Qwen3-Embedding) use contrastive-only training, which optimizes embeddings but sacrifices fine-tuning ability. BidirLM restores a prior MNTP phase, advancing the Pareto frontier on both MTEB and XTREME simultaneously. ## Citation ```bibtex @misc{boizard2026bidirlmtextomnimodalbidirectional, title={BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs}, author={Nicolas Boizard and Théo Deschamps-Berger and Hippolyte Gisserot-Boukhlef and Céline Hudelot and Pierre Colombo}, year={2026}, eprint={2604.02045}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2604.02045}, } ```