lwanming commited on
Commit
548857c
Β·
verified Β·
1 Parent(s): bf05980

Use full fp16 transformer model

Browse files

Use Split-Scaling to avoid fp16 overflow:

Overflow Pattern:
` SiLU(w1) Γ— w3 β†’ [Inf!] β†’ w2 β†’ SLN`

Split Scaling Fix:
SiLU(w1) Γ— (1/8) ──┐
β”œβ†’ Mul_1 β†’ w2 β†’ SLN (no overfolw)
w3 Γ— (1/16) β”€β”€β”€β”€β”€β”˜

Math: Mul_1 = SiLU(w1)/8 Γ— w3/16 = SiLU(w1)Γ—w3/128

Files changed (0) hide show