Merge pull request #22 from MotifTechnologies/jangwoong/mla-rope-fa4-port 5adea7d unverified Jangwoong Kim commited on 26 days ago
test: numerical parity for MLA RoPE fused kernels vs PyTorch reference 0c42208 3v324v23 Claude Opus 4.6 (1M context) commited on 27 days ago
fix: unify all backward kernels to input-based math + fix test import 09ecd67 wyldecat Claude Opus 4.6 (1M context) commited on 28 days ago
style: fix yapf/isort formatting for CI --all-files check 3f2678c wyldecat Claude Opus 4.6 (1M context) commited on 29 days ago
style: apply yapf, isort, and clang-format 6436ad6 wyldecat Claude Opus 4.6 (1M context) commited on Apr 6
fix: rename stale references and clean up Triton remnants 5a9d09d wyldecat Claude Opus 4.6 (1M context) commited on Apr 6
refactor: remove Triton kernels, add hidden_clamp to unscored ops 906e125 wyldecat Claude Opus 4.6 (1M context) commited on Apr 6
test: add scores and hidden_clamp tests for fused_mul_grouped_poly_norm f06406d wyldecat Claude Opus 4.6 (1M context) commited on Apr 6
feat: add GroupedFusedMulPolyNorm Triton kernel for MoE models (#16) e195bbb unverified TaehyunKim Claude Opus 4.6 github-actions[bot] commited on Mar 6
refactor(activation): change fused_add_rms_norm and fused_add_rms_norm_backward to out-place operations 7e4334d wyldecat commited on Oct 13, 2025