lhallee commited on
Commit
7efecf5
·
verified ·
1 Parent(s): 1d800fc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -11,7 +11,12 @@ FastESM is a Huggingface compatible plug in version of ESM2 rewritten with a new
11
 
12
  Load any ESM2 models into a FastEsm model to dramatically speed up training and inference without **ANY** cost in performance.
13
 
14
- Outputting attention maps (or the contact prediction head) is not natively possible with SDPA. You can still pass ```output_attentions``` to have attention calculated manually and returned.
 
 
 
 
 
15
  Various other optimizations also make the base implementation slightly different than the one in transformers.
16
 
17
  ## Use with 🤗 transformers
 
11
 
12
  Load any ESM2 models into a FastEsm model to dramatically speed up training and inference without **ANY** cost in performance.
13
 
14
+ ## Attention backend defaults
15
+ Flex Attention with a block mask that ignores pad tokens is the default attention backend. If Flex Attention is unavailable, FastESM falls back to native PyTorch attention.
16
+
17
+ For throughput and memory efficiency, `torch.compile(...)` is heavily recommended, especially when using Flex Attention.
18
+
19
+ Outputting attention maps (or the contact prediction head) is not natively possible with the optimized attention backends (including Flex Attention). You can still pass ```output_attentions``` to have attention calculated manually and returned.
20
  Various other optimizations also make the base implementation slightly different than the one in transformers.
21
 
22
  ## Use with 🤗 transformers