Transformers
Safetensors
dplm2
custom_code
lhallee commited on
Commit
b04f443
Β·
verified Β·
1 Parent(s): 268f4f1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +69 -69
README.md CHANGED
@@ -1,69 +1,69 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
-
6
- # NOTE
7
- The GitHub with the implementation and requirements can be found [here](https://github.com/Synthyra/FastPLMs.git).
8
-
9
- # DPLM2
10
- Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.
11
-
12
- ## Supported models
13
- ```python
14
- model_dict = {
15
- "Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
16
- "Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
17
- "Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
18
- }
19
- ```
20
-
21
- ## Use with transformers
22
- ```python
23
- import torch
24
- from transformers import AutoModel, AutoModelForMaskedLM
25
-
26
- model_path = "Synthyra/DPLM2-150M"
27
- model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
28
- tokenizer = model.tokenizer
29
-
30
- batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
31
- with torch.no_grad():
32
- hidden = model(**batch).last_hidden_state
33
-
34
- mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
35
- with torch.no_grad():
36
- logits = mlm(**batch).logits
37
- ```
38
-
39
- ## DPLM2 modality types
40
- DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when they are not provided.
41
-
42
- ## Attention backends
43
-
44
- `sdpa` (PyTorch Scaled Dot Product Attention) is the default.
45
-
46
- | Backend | Key | Notes |
47
- | :--- | :--- | :--- |
48
- | PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
49
- | Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built β€” no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential β€” use `"sdpa"` if exact numerics matter. |
50
- | Flex Attention | `"flex"` | Skips padding tokens via block mask β€” faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
51
- | Auto | `"auto"` | Picks the best available: `kernels_flash` β†’ `flex` β†’ `sdpa`. |
52
-
53
- Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):
54
-
55
- ```python
56
- from transformers import AutoConfig, AutoModel
57
-
58
- # Option 1: set before loading
59
- config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
60
- config.attn_backend = "flex"
61
- model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)
62
-
63
- # Option 2: set after loading
64
- model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
65
- model.attn_backend = "flex" # propagates to all attention layers in-place
66
- ```
67
-
68
- ## Embed datasets
69
- All DPLM2 models inherit `EmbeddingMixin`, so you can call `model.embed_dataset(...)` directly.
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # NOTE
7
+ The GitHub with the implementation and requirements can be found [here](https://github.com/Synthyra/FastPLMs.git).
8
+
9
+ # DPLM2
10
+ Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.
11
+
12
+ ## Supported models
13
+ ```python
14
+ model_dict = {
15
+ "Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
16
+ "Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
17
+ "Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
18
+ }
19
+ ```
20
+
21
+ ## Use with transformers
22
+ ```python
23
+ import torch
24
+ from transformers import AutoModel, AutoModelForMaskedLM
25
+
26
+ model_path = "Synthyra/DPLM2-150M"
27
+ model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
28
+ tokenizer = model.tokenizer
29
+
30
+ batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
31
+ with torch.no_grad():
32
+ hidden = model(**batch).last_hidden_state
33
+
34
+ mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
35
+ with torch.no_grad():
36
+ logits = mlm(**batch).logits
37
+ ```
38
+
39
+ ## DPLM2 modality types
40
+ DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when they are not provided.
41
+
42
+ ## Attention backends
43
+
44
+ `sdpa` (PyTorch Scaled Dot Product Attention) is the default.
45
+
46
+ | Backend | Key | Notes |
47
+ | :--- | :--- | :--- |
48
+ | PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
49
+ | Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built β€” no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential β€” use `"sdpa"` if exact numerics matter. |
50
+ | Flex Attention | `"flex"` | Skips padding tokens via block mask β€” faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
51
+ | Auto | `"auto"` | Picks the best available: `kernels_flash` β†’ `flex` β†’ `sdpa`. |
52
+
53
+ Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):
54
+
55
+ ```python
56
+ from transformers import AutoConfig, AutoModel
57
+
58
+ # Option 1: set before loading
59
+ config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
60
+ config.attn_backend = "flex"
61
+ model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)
62
+
63
+ # Option 2: set after loading
64
+ model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
65
+ model.attn_backend = "flex" # propagates to all attention layers in-place
66
+ ```
67
+
68
+ ## Embed datasets
69
+ All DPLM2 models inherit `EmbeddingMixin`, so you can call `model.embed_dataset(...)` directly.