Synthyra
/

ESM2-3B

@@ -11,7 +11,12 @@ FastESM is a Huggingface compatible plug in version of ESM2 rewritten with a new
 Load any ESM2 models into a FastEsm model to dramatically speed up training and inference without **ANY** cost in performance.
-Outputting attention maps (or the contact prediction head) is not natively possible with SDPA. You can still pass ```output_attentions``` to have attention calculated manually and returned.
 Various other optimizations also make the base implementation slightly different than the one in transformers.
 ## Use with 🤗 transformers

 Load any ESM2 models into a FastEsm model to dramatically speed up training and inference without **ANY** cost in performance.
+## Attention backend defaults
+Flex Attention with a block mask that ignores pad tokens is the default attention backend. If Flex Attention is unavailable, FastESM falls back to native PyTorch attention.
+For throughput and memory efficiency, `torch.compile(...)` is heavily recommended, especially when using Flex Attention.
+Outputting attention maps (or the contact prediction head) is not natively possible with the optimized attention backends (including Flex Attention). You can still pass ```output_attentions``` to have attention calculated manually and returned.
 Various other optimizations also make the base implementation slightly different than the one in transformers.
 ## Use with 🤗 transformers