Commit History

perf(htm): T9 input-tile memcpy_async to cluster smem (bandwidth reduction)
09aa6be
verified

icarus112 commited on

fix(htm): add cluster.sync between Stage A writes and next-timestep reads (T8 bimodal fix)
b4f024c
verified

icarus112 commited on

perf(htm): cluster distributed shared memory for inhib_thr/boost/active_duty (T8)
321ed28
verified

icarus112 commited on

fix(htm): set NON_PORTABLE_CLUSTER_SIZE_ALLOWED=1 for cluster_size=16
19dee83
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
278f184
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
73e6160
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
7721a60
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
9b51027
verified

icarus112 commited on

perf(htm): DLB software grid barrier + non-cooperative launch (lifts 132-SM cap)
5c751ce
verified

icarus112 commited on

perf(htm): DLB software grid barrier + non-cooperative launch (lifts 132-SM cap)
d09ca5e
verified

icarus112 commited on

fix(htm): drop py.allow_threads (raw ptrs not Send)
783ed8e
verified

icarus112 commited on

perf(htm): batched cooperative kernel β€” B=8 regions in ONE launch via blockIdx.y indexing
20577da
verified

icarus112 commited on

perf(htm): batched cooperative kernel β€” B=8 regions in ONE launch via blockIdx.y indexing
bd5981d
verified

icarus112 commited on

perf(htm): batched cooperative kernel β€” B=8 regions in ONE launch via blockIdx.y indexing
b4bfe98
verified

icarus112 commited on

perf(htm): remove per-call dev.sync, cache SDR, device_sync once per step
4ef1558
verified

icarus112 commited on

fix(htm_rust): remove software grid-barrier slow path
5491587
verified

icarus112 commited on

fix(htm_rust): remove software grid-barrier slow path
66ff84c
verified

icarus112 commited on

fix(htm_rust): remove software grid-barrier slow path
c122498
verified

icarus112 commited on

Update Feather H200 training runtime image
28df7d8
verified

icarus112 commited on

Update Feather H200 training runtime image
36a4397
verified

icarus112 commited on