Commit History
Clean ROCm grouped_gemm fallback and add tests aeb3812
Fix ROCm grouped_gemm accumulation corruption 104fd3c
Add ROCm build artifacts and HIP backend 1e407f0
fix: include torch compile flag 8176cbe
drbh commited on
fix: support torch compile via fake tensors cd5b9c4
drbh commited on
fix: add quickstart and avoid autotune when no cuda 09e15a7
drbh commited on
feat: support shared experts layer and tests 89e2950
drbh commited on
fix: extract expert device mesh for group from unused prehook aa23f77
drbh commited on
fix: prefer using passed parallel group 76c7de7
drbh commited on
fix: adjust layer params in source 9a1816c
drbh commited on
fix: improve expert parallel implementation and refactors e47036a
drbh commited on
fix: add parallel forward functional logic 13afbbe
drbh commited on
fix: fully vendor stk and fix imports 63599de
drbh commited on
fix: vendor stk decorators 0586ba6
drbh commited on
feat: add functional version of moe class and layer dabb815
drbh commited on
fix: prefer relative imports 484fde0
drbh commited on
feat: vendor grouped gemm 3224250
drbh commited on
feat: validate build with original test suite 9c4ca75
drbh commited on
feat: initial port of megablocks to builder format 2595c46
drbh commited on