datalama commited on
Commit
310ba72
·
verified ·
1 Parent(s): 680ad1c

Update README: add migration details, verification results, and v5 usage examples

Browse files
Files changed (1) hide show
  1. README.md +62 -9
README.md CHANGED
@@ -12,22 +12,75 @@ base_model: jhu-clsp/mmBERT-small
12
 
13
  Transformers v5 compatible checkpoint of [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small).
14
 
15
- ## Changes from original
 
 
 
 
 
 
 
16
 
17
- - **Weights**: Converted from `pytorch_model.bin` to `model.safetensors` (bitwise verified)
18
- - **Config**: Added explicit `rope_parameters` for transformers v5 compatibility
19
- - **Parameters**: 140M
20
 
21
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ```python
24
- from transformers import AutoModel, AutoTokenizer
25
 
26
- model = AutoModel.from_pretrained("datalama/mmBERT-small")
27
  tokenizer = AutoTokenizer.from_pretrained("datalama/mmBERT-small")
 
 
 
 
28
  ```
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## Credit
31
 
32
- Original model by [JHU CLSP](https://huggingface.co/jhu-clsp).
33
- See the [original model card](https://huggingface.co/jhu-clsp/mmBERT-small) for training details and benchmarks.
 
12
 
13
  Transformers v5 compatible checkpoint of [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small).
14
 
15
+ | | |
16
+ |---|---|
17
+ | **Parameters** | 140M |
18
+ | **Hidden size** | 384 |
19
+ | **Layers** | 22 |
20
+ | **Attention heads** | 6 |
21
+ | **Max seq length** | 8,192 |
22
+ | **RoPE theta** | 160,000 (both global & local) |
23
 
24
+ ## Usage (transformers v5)
 
 
25
 
26
+ ```python
27
+ from transformers import ModernBertModel, AutoTokenizer
28
+
29
+ tokenizer = AutoTokenizer.from_pretrained("datalama/mmBERT-small")
30
+ model = ModernBertModel.from_pretrained("datalama/mmBERT-small")
31
+
32
+ inputs = tokenizer("인공지능 기술은 빠르게 발전하고 있습니다.", return_tensors="pt")
33
+ outputs = model(**inputs)
34
+
35
+ # [CLS] embedding (384-dim)
36
+ cls_embedding = outputs.last_hidden_state[:, 0, :]
37
+ ```
38
+
39
+ For masked language modeling:
40
 
41
  ```python
42
+ from transformers import ModernBertForMaskedLM, AutoTokenizer
43
 
 
44
  tokenizer = AutoTokenizer.from_pretrained("datalama/mmBERT-small")
45
+ model = ModernBertForMaskedLM.from_pretrained("datalama/mmBERT-small")
46
+
47
+ inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
48
+ outputs = model(**inputs)
49
  ```
50
 
51
+ ## Migration Details
52
+
53
+ This checkpoint was migrated from [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) with the following changes:
54
+
55
+ **1. Weight format**: `pytorch_model.bin` → `model.safetensors`
56
+ - Tied weights (`model.embeddings.tok_embeddings.weight` ↔ `decoder.weight`) were cloned to separate tensors before saving
57
+ - All 138 tensors verified bitwise equal after conversion
58
+
59
+ **2. Config**: Added explicit `rope_parameters` for transformers v5
60
+ ```json
61
+ {
62
+ "global_rope_theta": 160000,
63
+ "local_rope_theta": 160000,
64
+ "rope_parameters": {
65
+ "full_attention": {"rope_type": "default", "rope_theta": 160000.0},
66
+ "sliding_attention": {"rope_type": "default", "rope_theta": 160000.0}
67
+ }
68
+ }
69
+ ```
70
+ The original flat fields (`global_rope_theta`, `local_rope_theta`) are preserved for backward compatibility.
71
+ In transformers v5, `ModernBertConfig` defaults `sliding_attention.rope_theta` to 10,000 — but mmBERT uses 160,000 for both, so explicit `rope_parameters` are required.
72
+
73
+ ## Verification
74
+
75
+ Cross-environment verification was performed between transformers v4 (original) and v5 (this checkpoint):
76
+
77
+ | Check | Result |
78
+ |---|---|
79
+ | **RoPE config** | `rope_parameters` present, theta=160,000 for both attention types |
80
+ | **Weight integrity** | 138 tensors bitwise equal (jhu-clsp `.bin` vs datalama `.safetensors`) |
81
+ | **Inference output** | v4 vs v5 max diff across 4 multilingual sentences: **1.14e-05** |
82
+ | **Fine-tuning readiness** | Tokenizer roundtrip, forward+backward pass, gradient propagation — all OK |
83
+
84
  ## Credit
85
 
86
+ Original model by [JHU CLSP](https://huggingface.co/jhu-clsp). See the [original model card](https://huggingface.co/jhu-clsp/mmBERT-small) for training details and benchmarks.