TheoDB's picture
Upload BidirLM-1B-Embedding
13c984c verified
---
tags:
- mteb
- sentence-transformers
- transformers
- embedding
- bidirectional
- multilingual
pipeline_tag: sentence-similarity
license: apache-2.0
base_model: BidirLM/BidirLM-1B-Base
language:
- multilingual
- af
- am
- ar
- az
- be
- bg
- bn
- bs
- ca
- ceb
- cs
- cy
- da
- de
- el
- en
- es
- et
- eu
- fa
- fi
- fr
- ga
- gl
- gu
- ha
- he
- hi
- hr
- ht
- hu
- hy
- id
- ig
- is
- it
- ja
- jv
- ka
- kk
- kn
- ko
- ky
- lt
- lv
- mg
- mk
- ml
- mr
- ms
- mt
- my
- nb
- ne
- nl
- nso
- ny
- pa
- pl
- ps
- pt
- ro
- ru
- sd
- si
- sk
- sl
- sn
- so
- sq
- sr
- su
- sv
- sw
- ta
- te
- th
- tl
- tr
- uk
- ur
- vi
- wo
- xh
- yo
- zh
- zu
---
# BidirLM-1B
BidirLM is a family of 5 frontier bidirectional encoders, including an omnimodal variant at 2.5B, adapted from causal decoder LLMs. Contrary to contrastive-only models, BidirLM relies on a prior masking phase (MNTP) that enables state-of-the-art results on task-specific fine-tuning (NER, classification, NLI) while achieving frontier performance on embedding benchmarks (MTEB) against open-source alternatives.
![Multilingual model performance by size on XTREME-Benchmark Augmented and MTEB Multilingual V2](final_results.png)
| Model | Base LLM | Parameters | Embedding Dim | Max Tokens | MTEB Multi. V2 (Mean Task) |
|---|---|---|---|---|---|
| BidirLM-270M | Gemma3-270M | 268M | 640 | 512 | 55.5 |
| BidirLM-0.6B | Qwen3-0.6B | 596M | 1024 | 512 | 59.6 |
| **BidirLM-1B** | **Gemma3-1B** | **1001M** | **1152** | **512** (\*) | **62.1** |
| BidirLM-1.7B | Qwen3-1.7B | 1721M | 2048 | 512 | 62.9 |
| BidirLM-Omni-2.5B | Qwen3-1.7B | 2.5B | 2048 | 512 | 63.1 |
(\*) While evaluated on MTEB with a max length of 512, the underlying architecture supports up to 32,768 context length (Gemma3). Longer sequences can be used by adjusting `model.max_seq_length` in Sentence Transformers or `max_length` in the tokenizer.
## Supported Tasks
**General embeddings** (via Sentence Transformers): retrieval, semantic similarity (STS), clustering, classification, pair classification, reranking, bitext mining, multilabel classification
**Downstream fine-tuning** (via Transformers): sequence classification (e.g. MNLI, XNLI, PAWS-X, MathShepherd), token classification (e.g. PAN-X, POS), information retrieval (e.g. MIRACL, CodeSearchNet), sequence regression (e.g. Seahorse)
## Usage
### Sentence Transformers
Use Sentence Transformers to compute embeddings for any text representation task.
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BidirLM/BidirLM-1B", trust_remote_code=True)
queries = [
"What is the capital of France?",
"How does photosynthesis work?",
]
documents = [
"Paris is the capital and largest city of France, situated on the river Seine.",
"Photosynthesis is the process by which plants convert sunlight, water, and CO2 into glucose and oxygen.",
]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
```
### Fine-tuning for Downstream Tasks
BidirLM can be directly fine-tuned for downstream tasks:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("BidirLM/BidirLM-1B", trust_remote_code=True)
# Sequence classification (e.g., NLI: entailment, neutral, contradiction)
seq_model = AutoModelForSequenceClassification.from_pretrained(
"BidirLM/BidirLM-1B",
trust_remote_code=True,
num_labels=3,
)
# Token classification (e.g., NER)
tok_model = AutoModelForTokenClassification.from_pretrained(
"BidirLM/BidirLM-1B",
trust_remote_code=True,
num_labels=7,
)
# Fine-tune with HuggingFace Trainer
```
## Evaluation
Please follow the [mteb repository](https://github.com/embeddings-benchmark/mteb) on how to reproduce our scores. The evaluation prompts used for each task are also available at [mteb_v2_eval_prompts.json](mteb_v2_eval_prompts.json).
## Supported Languages
Multilingual support across over 140 languages, inherited from the Gemma3 base model and reinforced through contrastive training with 87 languages.
## Requirements
This model requires `trust_remote_code=True` as it uses a custom bidirectional architecture.
```
transformers>=4.57.6,<5.0.0
sentence-transformers>=5.0.0
```
## FAQ
### 1. What pooling strategy does this model use?
The model uses **mean pooling**. This is handled automatically when using Sentence Transformers.
### 2. Do I need `trust_remote_code=True`?
Yes. BidirLM uses a custom bidirectional architecture (`BidirLMModel`) that requires loading custom code from the repository.
### 3. Why are my reproduced results slightly different from those reported in the model card?
Different versions of `transformers` and `pytorch` could cause negligible but non-zero performance differences. This model was trained and evaluated with `transformers==4.57.6` and `pytorch==2.6.0`.
### 4. What is the relationship between BidirLM-1B and BidirLM-1B-Base?
[BidirLM/BidirLM-1B-Base](https://huggingface.co/BidirLM/BidirLM-1B-Base) is the intermediate MNTP-adapted checkpoint (bidirectional pretraining stage). BidirLM-1B is the final contrastive fine-tuned version optimized for both sentence embeddings and downstream fine-tuning.
### 5. How is BidirLM different from other embedding models?
Most embedding models (BGE-M3, KaLM, EmbedGemma, Qwen3-Embedding) use contrastive-only training, which optimizes embeddings but sacrifices fine-tuning ability. BidirLM restores a prior MNTP phase, advancing the Pareto frontier on both MTEB and XTREME simultaneously.
## Citation
```bibtex
@misc{boizard2026bidirlmtextomnimodalbidirectional,
title={BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs},
author={Nicolas Boizard and Théo Deschamps-Berger and Hippolyte Gisserot-Boukhlef and Céline Hudelot and Pierre Colombo},
year={2026},
eprint={2604.02045},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.02045},
}
```