Instructions to use Synthyra/DPLM2-150M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Synthyra/DPLM2-150M with Transformers:
# Load model directly from transformers import EsmForDPLM2 model = EsmForDPLM2.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -46,7 +46,7 @@ DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when
|
|
| 46 |
| Backend | Key | Notes |
|
| 47 |
| :--- | :--- | :--- |
|
| 48 |
| PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
|
| 49 |
-
| Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install
|
| 50 |
| Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
|
| 51 |
| Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
|
| 52 |
|
|
|
|
| 46 |
| Backend | Key | Notes |
|
| 47 |
| :--- | :--- | :--- |
|
| 48 |
| PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
|
| 49 |
+
| Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built — no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential — use `"sdpa"` if exact numerics matter. |
|
| 50 |
| Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
|
| 51 |
| Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
|
| 52 |
|