fix(htm): set NON_PORTABLE_CLUSTER_SIZE_ALLOWED=1 for cluster_size=16 19dee83 verified icarus112 commited on Apr 19
perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr 7721a60 verified icarus112 commited on Apr 19
perf(htm): DLB software grid barrier + non-cooperative launch (lifts 132-SM cap) 5c751ce verified icarus112 commited on Apr 19
perf(htm): batched cooperative kernel — B=8 regions in ONE launch via blockIdx.y indexing bd5981d verified icarus112 commited on Apr 19