File size: 2,976 Bytes
3a990ca bac9c5c 3a990ca bac9c5c 3a990ca bac9c5c 3a990ca bac9c5c 3a990ca bac9c5c 3a990ca bac9c5c 3a990ca bac9c5c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | ---
library_name: transformers
tags:
- modernbert
- fill-mask
- multilingual
license: apache-2.0
base_model: jhu-clsp/mmBERT-base
---
# mmBERT-base
Transformers v5 compatible checkpoint of [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base).
| | |
|---|---|
| **Parameters** | 307M |
| **Hidden size** | 768 |
| **Layers** | 22 |
| **Attention heads** | 12 |
| **Max seq length** | 8,192 |
| **RoPE theta** | 160,000 (both global & local) |
## Usage (transformers v5)
```python
from transformers import ModernBertModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("datalama/mmBERT-base")
model = ModernBertModel.from_pretrained("datalama/mmBERT-base")
inputs = tokenizer("인공지능 기술은 빠르게 발전하고 있습니다.", return_tensors="pt")
outputs = model(**inputs)
# [CLS] embedding (768-dim)
cls_embedding = outputs.last_hidden_state[:, 0, :]
```
For masked language modeling:
```python
from transformers import ModernBertForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("datalama/mmBERT-base")
model = ModernBertForMaskedLM.from_pretrained("datalama/mmBERT-base")
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
outputs = model(**inputs)
```
## Migration Details
This checkpoint was migrated from [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) with the following changes:
**1. Weight format**: `pytorch_model.bin` → `model.safetensors`
- Tied weights (`model.embeddings.tok_embeddings.weight` ↔ `decoder.weight`) were cloned to separate tensors before saving
- All 138 tensors verified bitwise equal after conversion
**2. Config**: Added explicit `rope_parameters` for transformers v5
```json
{
"global_rope_theta": 160000,
"local_rope_theta": 160000,
"rope_parameters": {
"full_attention": {"rope_type": "default", "rope_theta": 160000.0},
"sliding_attention": {"rope_type": "default", "rope_theta": 160000.0}
}
}
```
The original flat fields (`global_rope_theta`, `local_rope_theta`) are preserved for backward compatibility.
In transformers v5, `ModernBertConfig` defaults `sliding_attention.rope_theta` to 10,000 — but mmBERT uses 160,000 for both, so explicit `rope_parameters` are required.
## Verification
Cross-environment verification was performed between transformers v4 (original) and v5 (this checkpoint):
| Check | Result |
|---|---|
| **RoPE config** | `rope_parameters` present, theta=160,000 for both attention types |
| **Weight integrity** | 138 tensors bitwise equal (jhu-clsp `.bin` vs datalama `.safetensors`) |
| **Inference output** | v4 vs v5 max diff across 4 multilingual sentences: **7.63e-06** |
| **Fine-tuning readiness** | Tokenizer roundtrip, forward+backward pass, gradient propagation — all OK |
## Credit
Original model by [JHU CLSP](https://huggingface.co/jhu-clsp). See the [original model card](https://huggingface.co/jhu-clsp/mmBERT-base) for training details and benchmarks.
|