Spaces:

H-Liu1997
/

FloodDiffusion-Streaming

Running on T4

H-Liu1997 commited on 21 days ago

Commit

80c3e53

1 Parent(s): 3ab7701

perf: use float16 instead of bfloat16 in SDPA for T4 tensor core acceleration

Files changed (1) hide show

model_manager.py CHANGED Viewed

@@ -122,6 +122,9 @@ class ModelManager:
             "    # SDPA fallback when flash-attn is not available (e.g., T4 GPU)\n"
             "    if not FLASH_ATTN_2_AVAILABLE and not FLASH_ATTN_3_AVAILABLE:\n"
             "        out_dtype = q.dtype\n"
             "        if q_lens is not None or k_lens is not None:\n"
             '            warnings.warn("Padding mask disabled with scaled_dot_product_attention")\n'
             "        q = q.transpose(1, 2).to(dtype)\n"

             "    # SDPA fallback when flash-attn is not available (e.g., T4 GPU)\n"
             "    if not FLASH_ATTN_2_AVAILABLE and not FLASH_ATTN_3_AVAILABLE:\n"
             "        out_dtype = q.dtype\n"
+            "        # T4 lacks native bfloat16; use float16 for tensor core acceleration\n"
+            "        if dtype == torch.bfloat16 and not torch.cuda.is_bf16_supported():\n"
+            "            dtype = torch.float16\n"
             "        if q_lens is not None or k_lens is not None:\n"
             '            warnings.warn("Padding mask disabled with scaled_dot_product_attention")\n'
             "        q = q.transpose(1, 2).to(dtype)\n"