Synthyra
/

DPLM2-650M

@@ -1,69 +1,69 @@
----
-library_name: transformers
-tags: []
----
-# NOTE
-The GitHub with the implementation and requirements can be found [here](https://github.com/Synthyra/FastPLMs.git).
-# DPLM2
-Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.
-## Supported models
-```python
-model_dict = {
-    "Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
-    "Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
-    "Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
-}
-```
-## Use with transformers
-```python
-import torch
-from transformers import AutoModel, AutoModelForMaskedLM
-model_path = "Synthyra/DPLM2-150M"
-model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
-tokenizer = model.tokenizer
-batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
-with torch.no_grad():
-    hidden = model(**batch).last_hidden_state
-mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
-with torch.no_grad():
-    logits = mlm(**batch).logits
-```
-## DPLM2 modality types
-DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when they are not provided.
-## Attention backends
-`sdpa` (PyTorch Scaled Dot Product Attention) is the default.
-| Backend | Key | Notes |
-| :--- | :--- | :--- |
-| PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
-| Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built — no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential — use `"sdpa"` if exact numerics matter. |
-| Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
-| Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
-Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):
-```python
-from transformers import AutoConfig, AutoModel
-# Option 1: set before loading
-config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
-config.attn_backend = "flex"
-model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)
-# Option 2: set after loading
-model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
-model.attn_backend = "flex"  # propagates to all attention layers in-place
-```
-## Embed datasets
-All DPLM2 models inherit `EmbeddingMixin`, so you can call `model.embed_dataset(...)` directly.

+---
+library_name: transformers
+tags: []
+---
+# NOTE
+The GitHub with the implementation and requirements can be found [here](https://github.com/Synthyra/FastPLMs.git).
+# DPLM2
+Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.
+## Supported models
+```python
+model_dict = {
+    "Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
+    "Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
+    "Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
+}
+```
+## Use with transformers
+```python
+import torch
+from transformers import AutoModel, AutoModelForMaskedLM
+model_path = "Synthyra/DPLM2-150M"
+model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
+tokenizer = model.tokenizer
+batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
+with torch.no_grad():
+    hidden = model(**batch).last_hidden_state
+mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
+with torch.no_grad():
+    logits = mlm(**batch).logits
+```
+## DPLM2 modality types
+DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when they are not provided.
+## Attention backends
+`sdpa` (PyTorch Scaled Dot Product Attention) is the default.
+| Backend | Key | Notes |
+| :--- | :--- | :--- |
+| PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
+| Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built — no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential — use `"sdpa"` if exact numerics matter. |
+| Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
+| Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
+Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):
+```python
+from transformers import AutoConfig, AutoModel
+# Option 1: set before loading
+config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
+config.attn_backend = "flex"
+model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)
+# Option 2: set after loading
+model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
+model.attn_backend = "flex"  # propagates to all attention layers in-place
+```
+## Embed datasets
+All DPLM2 models inherit `EmbeddingMixin`, so you can call `model.embed_dataset(...)` directly.