Commit History

perf(htm): T9 input-tile memcpy_async to cluster smem (bandwidth reduction)
09aa6be
verified

icarus112 commited on

fix(htm): add cluster.sync between Stage A writes and next-timestep reads (T8 bimodal fix)
b4f024c
verified

icarus112 commited on

perf(htm): cluster distributed shared memory for inhib_thr/boost/active_duty (T8)
321ed28
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
9b51027
verified

icarus112 commited on

perf(htm): DLB software grid barrier + non-cooperative launch (lifts 132-SM cap)
d09ca5e
verified

icarus112 commited on

perf(htm): batched cooperative kernel — B=8 regions in ONE launch via blockIdx.y indexing
20577da
verified

icarus112 commited on

fix(htm_rust): remove software grid-barrier slow path
66ff84c
verified

icarus112 commited on

Update Feather H200 training runtime image
36a4397
verified

icarus112 commited on