File size: 6,382 Bytes

13c984c

---
tags:
  - mteb
  - sentence-transformers
  - transformers
  - embedding
  - bidirectional
  - multilingual
pipeline_tag: sentence-similarity
license: apache-2.0
base_model: BidirLM/BidirLM-1B-Base
language:
  - multilingual
  - af
  - am
  - ar
  - az
  - be
  - bg
  - bn
  - bs
  - ca
  - ceb
  - cs
  - cy
  - da
  - de
  - el
  - en
  - es
  - et
  - eu
  - fa
  - fi
  - fr
  - ga
  - gl
  - gu
  - ha
  - he
  - hi
  - hr
  - ht
  - hu
  - hy
  - id
  - ig
  - is
  - it
  - ja
  - jv
  - ka
  - kk
  - kn
  - ko
  - ky
  - lt
  - lv
  - mg
  - mk
  - ml
  - mr
  - ms
  - mt
  - my
  - nb
  - ne
  - nl
  - nso
  - ny
  - pa
  - pl
  - ps
  - pt
  - ro
  - ru
  - sd
  - si
  - sk
  - sl
  - sn
  - so
  - sq
  - sr
  - su
  - sv
  - sw
  - ta
  - te
  - th
  - tl
  - tr
  - uk
  - ur
  - vi
  - wo
  - xh
  - yo
  - zh
  - zu
---

# BidirLM-1B

BidirLM is a family of 5 frontier bidirectional encoders, including an omnimodal variant at 2.5B, adapted from causal decoder LLMs. Contrary to contrastive-only models, BidirLM relies on a prior masking phase (MNTP) that enables state-of-the-art results on task-specific fine-tuning (NER, classification, NLI) while achieving frontier performance on embedding benchmarks (MTEB) against open-source alternatives.

![Multilingual model performance by size on XTREME-Benchmark Augmented and MTEB Multilingual V2](final_results.png)

| Model | Base LLM | Parameters | Embedding Dim | Max Tokens | MTEB Multi. V2 (Mean Task) |
|---|---|---|---|---|---|
| BidirLM-270M | Gemma3-270M | 268M | 640 | 512 | 55.5 |
| BidirLM-0.6B | Qwen3-0.6B | 596M | 1024 | 512 | 59.6 |
| **BidirLM-1B** | **Gemma3-1B** | **1001M** | **1152** | **512** (\*) | **62.1** |
| BidirLM-1.7B | Qwen3-1.7B | 1721M | 2048 | 512 | 62.9 |
| BidirLM-Omni-2.5B | Qwen3-1.7B | 2.5B | 2048 | 512 | 63.1 |

(\*) While evaluated on MTEB with a max length of 512, the underlying architecture supports up to 32,768 context length (Gemma3). Longer sequences can be used by adjusting `model.max_seq_length` in Sentence Transformers or `max_length` in the tokenizer.

## Supported Tasks

**General embeddings** (via Sentence Transformers): retrieval, semantic similarity (STS), clustering, classification, pair classification, reranking, bitext mining, multilabel classification

**Downstream fine-tuning** (via Transformers): sequence classification (e.g. MNLI, XNLI, PAWS-X, MathShepherd), token classification (e.g. PAN-X, POS), information retrieval (e.g. MIRACL, CodeSearchNet), sequence regression (e.g. Seahorse)

## Usage

### Sentence Transformers

Use Sentence Transformers to compute embeddings for any text representation task.

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BidirLM/BidirLM-1B", trust_remote_code=True)

queries = [
    "What is the capital of France?",
    "How does photosynthesis work?",
]
documents = [
    "Paris is the capital and largest city of France, situated on the river Seine.",
    "Photosynthesis is the process by which plants convert sunlight, water, and CO2 into glucose and oxygen.",
]

query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
```

### Fine-tuning for Downstream Tasks

BidirLM can be directly fine-tuned for downstream tasks:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("BidirLM/BidirLM-1B", trust_remote_code=True)

# Sequence classification (e.g., NLI: entailment, neutral, contradiction)
seq_model = AutoModelForSequenceClassification.from_pretrained(
    "BidirLM/BidirLM-1B",
    trust_remote_code=True,
    num_labels=3,
)

# Token classification (e.g., NER)
tok_model = AutoModelForTokenClassification.from_pretrained(
    "BidirLM/BidirLM-1B",
    trust_remote_code=True,
    num_labels=7,
)

# Fine-tune with HuggingFace Trainer
```

## Evaluation

Please follow the [mteb repository](https://github.com/embeddings-benchmark/mteb) on how to reproduce our scores. The evaluation prompts used for each task are also available at [mteb_v2_eval_prompts.json](mteb_v2_eval_prompts.json).

## Supported Languages

Multilingual support across over 140 languages, inherited from the Gemma3 base model and reinforced through contrastive training with 87 languages.

## Requirements

This model requires `trust_remote_code=True` as it uses a custom bidirectional architecture.

```
transformers>=4.57.6,<5.0.0
sentence-transformers>=5.0.0
```

## FAQ

### 1. What pooling strategy does this model use?

The model uses **mean pooling**. This is handled automatically when using Sentence Transformers.

### 2. Do I need `trust_remote_code=True`?

Yes. BidirLM uses a custom bidirectional architecture (`BidirLMModel`) that requires loading custom code from the repository.

### 3. Why are my reproduced results slightly different from those reported in the model card?

Different versions of `transformers` and `pytorch` could cause negligible but non-zero performance differences. This model was trained and evaluated with `transformers==4.57.6` and `pytorch==2.6.0`.

### 4. What is the relationship between BidirLM-1B and BidirLM-1B-Base?

[BidirLM/BidirLM-1B-Base](https://huggingface.co/BidirLM/BidirLM-1B-Base) is the intermediate MNTP-adapted checkpoint (bidirectional pretraining stage). BidirLM-1B is the final contrastive fine-tuned version optimized for both sentence embeddings and downstream fine-tuning.

### 5. How is BidirLM different from other embedding models?

Most embedding models (BGE-M3, KaLM, EmbedGemma, Qwen3-Embedding) use contrastive-only training, which optimizes embeddings but sacrifices fine-tuning ability. BidirLM restores a prior MNTP phase, advancing the Pareto frontier on both MTEB and XTREME simultaneously.

## Citation

```bibtex
@misc{boizard2026bidirlmtextomnimodalbidirectional,
      title={BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs}, 
      author={Nicolas Boizard and Théo Deschamps-Berger and Hippolyte Gisserot-Boukhlef and Céline Hudelot and Pierre Colombo},
      year={2026},
      eprint={2604.02045},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.02045}, 
}
```