lhallee commited on
Commit
05b72df
·
verified ·
1 Parent(s): 3327b8c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -10,20 +10,26 @@ The GitHub with the implementation and requirements.txt can be found [here](http
10
  [ESM++](https://github.com/Synthyra/ESMplusplus) is a faithful implementation of [ESMC](https://www.evolutionaryscale.ai/blog/esm-cambrian) ([license](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement)) that allows for batching and standard Huggingface compatibility without requiring the ESM Python package.
11
  The small version corresponds to the 300 million parameter version of ESMC.
12
 
13
- ## Attention backend defaults
14
- `sdpa` is the default attention backend for ESM++.
15
 
16
- To enable Flex Attention, set `attn_backend="flex"` in the config before loading the model:
 
 
 
 
 
 
 
17
 
18
  ```python
19
  from transformers import AutoConfig, AutoModelForMaskedLM
20
 
21
  config = AutoConfig.from_pretrained('Synthyra/ESMplusplus_small', trust_remote_code=True)
22
- config.attn_backend = "flex"
23
  model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_small', config=config, trust_remote_code=True)
24
  ```
25
 
26
- For throughput and memory efficiency, `torch.compile(...)` is heavily recommended, especially when using Flex Attention.
27
 
28
 
29
  ## Use with 🤗 transformers
 
10
  [ESM++](https://github.com/Synthyra/ESMplusplus) is a faithful implementation of [ESMC](https://www.evolutionaryscale.ai/blog/esm-cambrian) ([license](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement)) that allows for batching and standard Huggingface compatibility without requiring the ESM Python package.
11
  The small version corresponds to the 300 million parameter version of ESMC.
12
 
13
+ ## Attention backends
 
14
 
15
+ `sdpa` (PyTorch Scaled Dot Product Attention) is the default. The backend is set via `config.attn_backend` before loading.
16
+
17
+ | Backend | Key | Notes |
18
+ | :--- | :--- | :--- |
19
+ | PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
20
+ | Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built — no hours-long compilation). Outputs differ slightly from SDPA due to online softmax reordering, but differences are numerically harmless. |
21
+ | Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
22
+ | Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
23
 
24
  ```python
25
  from transformers import AutoConfig, AutoModelForMaskedLM
26
 
27
  config = AutoConfig.from_pretrained('Synthyra/ESMplusplus_small', trust_remote_code=True)
28
+ config.attn_backend = "flex" # or "kernels_flash", "sdpa", "auto"
29
  model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_small', config=config, trust_remote_code=True)
30
  ```
31
 
32
+ `torch.compile(model)` is heavily recommended for sustained throughput, especially with Flex Attention.
33
 
34
 
35
  ## Use with 🤗 transformers