mmBERT-base / README.md
datalama's picture
Update README: add migration details, verification results, and v5 usage examples
bac9c5c verified
---
library_name: transformers
tags:
- modernbert
- fill-mask
- multilingual
license: apache-2.0
base_model: jhu-clsp/mmBERT-base
---
# mmBERT-base
Transformers v5 compatible checkpoint of [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base).
| | |
|---|---|
| **Parameters** | 307M |
| **Hidden size** | 768 |
| **Layers** | 22 |
| **Attention heads** | 12 |
| **Max seq length** | 8,192 |
| **RoPE theta** | 160,000 (both global & local) |
## Usage (transformers v5)
```python
from transformers import ModernBertModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("datalama/mmBERT-base")
model = ModernBertModel.from_pretrained("datalama/mmBERT-base")
inputs = tokenizer("์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ ์€ ๋น ๋ฅด๊ฒŒ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.", return_tensors="pt")
outputs = model(**inputs)
# [CLS] embedding (768-dim)
cls_embedding = outputs.last_hidden_state[:, 0, :]
```
For masked language modeling:
```python
from transformers import ModernBertForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("datalama/mmBERT-base")
model = ModernBertForMaskedLM.from_pretrained("datalama/mmBERT-base")
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
outputs = model(**inputs)
```
## Migration Details
This checkpoint was migrated from [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with the following changes:
**1. Weight format**: `pytorch_model.bin` โ†’ `model.safetensors`
- Tied weights (`model.embeddings.tok_embeddings.weight` โ†” `decoder.weight`) were cloned to separate tensors before saving
- All 138 tensors verified bitwise equal after conversion
**2. Config**: Added explicit `rope_parameters` for transformers v5
```json
{
"global_rope_theta": 160000,
"local_rope_theta": 160000,
"rope_parameters": {
"full_attention": {"rope_type": "default", "rope_theta": 160000.0},
"sliding_attention": {"rope_type": "default", "rope_theta": 160000.0}
}
}
```
The original flat fields (`global_rope_theta`, `local_rope_theta`) are preserved for backward compatibility.
In transformers v5, `ModernBertConfig` defaults `sliding_attention.rope_theta` to 10,000 โ€” but mmBERT uses 160,000 for both, so explicit `rope_parameters` are required.
## Verification
Cross-environment verification was performed between transformers v4 (original) and v5 (this checkpoint):
| Check | Result |
|---|---|
| **RoPE config** | `rope_parameters` present, theta=160,000 for both attention types |
| **Weight integrity** | 138 tensors bitwise equal (jhu-clsp `.bin` vs datalama `.safetensors`) |
| **Inference output** | v4 vs v5 max diff across 4 multilingual sentences: **7.63e-06** |
| **Fine-tuning readiness** | Tokenizer roundtrip, forward+backward pass, gradient propagation โ€” all OK |
## Credit
Original model by [JHU CLSP](https://huggingface.co/jhu-clsp). See the [original model card](https://huggingface.co/jhu-clsp/mmBERT-base) for training details and benchmarks.