File size: 6,382 Bytes
13c984c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
tags:
  - mteb
  - sentence-transformers
  - transformers
  - embedding
  - bidirectional
  - multilingual
pipeline_tag: sentence-similarity
license: apache-2.0
base_model: BidirLM/BidirLM-1B-Base
language:
  - multilingual
  - af
  - am
  - ar
  - az
  - be
  - bg
  - bn
  - bs
  - ca
  - ceb
  - cs
  - cy
  - da
  - de
  - el
  - en
  - es
  - et
  - eu
  - fa
  - fi
  - fr
  - ga
  - gl
  - gu
  - ha
  - he
  - hi
  - hr
  - ht
  - hu
  - hy
  - id
  - ig
  - is
  - it
  - ja
  - jv
  - ka
  - kk
  - kn
  - ko
  - ky
  - lt
  - lv
  - mg
  - mk
  - ml
  - mr
  - ms
  - mt
  - my
  - nb
  - ne
  - nl
  - nso
  - ny
  - pa
  - pl
  - ps
  - pt
  - ro
  - ru
  - sd
  - si
  - sk
  - sl
  - sn
  - so
  - sq
  - sr
  - su
  - sv
  - sw
  - ta
  - te
  - th
  - tl
  - tr
  - uk
  - ur
  - vi
  - wo
  - xh
  - yo
  - zh
  - zu
---

# BidirLM-1B

BidirLM is a family of 5 frontier bidirectional encoders, including an omnimodal variant at 2.5B, adapted from causal decoder LLMs. Contrary to contrastive-only models, BidirLM relies on a prior masking phase (MNTP) that enables state-of-the-art results on task-specific fine-tuning (NER, classification, NLI) while achieving frontier performance on embedding benchmarks (MTEB) against open-source alternatives.

![Multilingual model performance by size on XTREME-Benchmark Augmented and MTEB Multilingual V2](final_results.png)

| Model | Base LLM | Parameters | Embedding Dim | Max Tokens | MTEB Multi. V2 (Mean Task) |
|---|---|---|---|---|---|
| BidirLM-270M | Gemma3-270M | 268M | 640 | 512 | 55.5 |
| BidirLM-0.6B | Qwen3-0.6B | 596M | 1024 | 512 | 59.6 |
| **BidirLM-1B** | **Gemma3-1B** | **1001M** | **1152** | **512** (\*) | **62.1** |
| BidirLM-1.7B | Qwen3-1.7B | 1721M | 2048 | 512 | 62.9 |
| BidirLM-Omni-2.5B | Qwen3-1.7B | 2.5B | 2048 | 512 | 63.1 |

(\*) While evaluated on MTEB with a max length of 512, the underlying architecture supports up to 32,768 context length (Gemma3). Longer sequences can be used by adjusting `model.max_seq_length` in Sentence Transformers or `max_length` in the tokenizer.

## Supported Tasks

**General embeddings** (via Sentence Transformers): retrieval, semantic similarity (STS), clustering, classification, pair classification, reranking, bitext mining, multilabel classification

**Downstream fine-tuning** (via Transformers): sequence classification (e.g. MNLI, XNLI, PAWS-X, MathShepherd), token classification (e.g. PAN-X, POS), information retrieval (e.g. MIRACL, CodeSearchNet), sequence regression (e.g. Seahorse)

## Usage

### Sentence Transformers

Use Sentence Transformers to compute embeddings for any text representation task.

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BidirLM/BidirLM-1B", trust_remote_code=True)

queries = [
    "What is the capital of France?",
    "How does photosynthesis work?",
]
documents = [
    "Paris is the capital and largest city of France, situated on the river Seine.",
    "Photosynthesis is the process by which plants convert sunlight, water, and CO2 into glucose and oxygen.",
]

query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
```

### Fine-tuning for Downstream Tasks

BidirLM can be directly fine-tuned for downstream tasks:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("BidirLM/BidirLM-1B", trust_remote_code=True)

# Sequence classification (e.g., NLI: entailment, neutral, contradiction)
seq_model = AutoModelForSequenceClassification.from_pretrained(
    "BidirLM/BidirLM-1B",
    trust_remote_code=True,
    num_labels=3,
)

# Token classification (e.g., NER)
tok_model = AutoModelForTokenClassification.from_pretrained(
    "BidirLM/BidirLM-1B",
    trust_remote_code=True,
    num_labels=7,
)

# Fine-tune with HuggingFace Trainer
```

## Evaluation

Please follow the [mteb repository](https://github.com/embeddings-benchmark/mteb) on how to reproduce our scores. The evaluation prompts used for each task are also available at [mteb_v2_eval_prompts.json](mteb_v2_eval_prompts.json).

## Supported Languages

Multilingual support across over 140 languages, inherited from the Gemma3 base model and reinforced through contrastive training with 87 languages.

## Requirements

This model requires `trust_remote_code=True` as it uses a custom bidirectional architecture.

```
transformers>=4.57.6,<5.0.0
sentence-transformers>=5.0.0
```

## FAQ

### 1. What pooling strategy does this model use?

The model uses **mean pooling**. This is handled automatically when using Sentence Transformers.

### 2. Do I need `trust_remote_code=True`?

Yes. BidirLM uses a custom bidirectional architecture (`BidirLMModel`) that requires loading custom code from the repository.

### 3. Why are my reproduced results slightly different from those reported in the model card?

Different versions of `transformers` and `pytorch` could cause negligible but non-zero performance differences. This model was trained and evaluated with `transformers==4.57.6` and `pytorch==2.6.0`.

### 4. What is the relationship between BidirLM-1B and BidirLM-1B-Base?

[BidirLM/BidirLM-1B-Base](https://huggingface.co/BidirLM/BidirLM-1B-Base) is the intermediate MNTP-adapted checkpoint (bidirectional pretraining stage). BidirLM-1B is the final contrastive fine-tuned version optimized for both sentence embeddings and downstream fine-tuning.

### 5. How is BidirLM different from other embedding models?

Most embedding models (BGE-M3, KaLM, EmbedGemma, Qwen3-Embedding) use contrastive-only training, which optimizes embeddings but sacrifices fine-tuning ability. BidirLM restores a prior MNTP phase, advancing the Pareto frontier on both MTEB and XTREME simultaneously.

## Citation

```bibtex
@misc{boizard2026bidirlmtextomnimodalbidirectional,
      title={BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs}, 
      author={Nicolas Boizard and Théo Deschamps-Berger and Hippolyte Gisserot-Boukhlef and Céline Hudelot and Pierre Colombo},
      year={2026},
      eprint={2604.02045},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.02045}, 
}
```