Commit History

Update Feather H200 training runtime image
7be430d
verified

icarus112 commited on

Update Feather H200 training runtime image
05cbdf8
verified

icarus112 commited on

Update Feather H200 training runtime image
7d24778
verified

icarus112 commited on

Update Feather H200 training runtime image
2138b90
verified

icarus112 commited on

Update Feather H200 training runtime image
0326fb2
verified

icarus112 commited on

Update Feather H200 training runtime image
6334754
verified

icarus112 commited on

Update Feather H200 training runtime image
760789c
verified

icarus112 commited on

Update Feather H200 training runtime image
63d650c
verified

icarus112 commited on

Update Feather H200 training runtime image
31f8e0d
verified

icarus112 commited on

Update Feather H200 training runtime image
a8a5806
verified

icarus112 commited on

Update Feather H200 training runtime image
f94e4a7
verified

icarus112 commited on

Update Feather H200 training runtime image
9799504
verified

icarus112 commited on

Update Feather H200 training runtime image
5b5c422
verified

icarus112 commited on

fix: use wte.weight.shape instead of undefined config var
7d0a5d8
verified

icarus112 commited on

fix: lm_head init std=0.02 (was 0.001 β€” pathological at V=65k) + retina HF cache
1627611
verified

icarus112 commited on

fix: lm_head init std=0.02 (was 0.001 β€” pathological at V=65k) + retina HF cache
b8e2ac7
verified

icarus112 commited on

fix(mid_val): use prepare.get_token_bytes(device=cuda) + model(x,y,reduction=none) API
ea7be5a
verified

icarus112 commited on

feat(sdr_retina): streaming Nemotron path when HYDRA_USE_NEMOTRON=1
c13016b
verified

icarus112 commited on

fix(tokenizer): use rustbpe.train_from_iterator API; bump vocab to 65536
89bd6c2
verified

icarus112 commited on

fix(tokenizer): use rustbpe.train_from_iterator API; bump vocab to 65536
ea3cc17
verified

icarus112 commited on

perf(nemotron): 2-stage prefetch pipeline (HF→tokenizer→packer) zero-tps-loss
f4757c2
verified

icarus112 commited on

feat(nemotron): streaming pretraining loader for Specialized-v1.1 (Super3 recipe)
94eabe0
verified

icarus112 commited on

feat(nemotron): streaming pretraining loader for Specialized-v1.1 (Super3 recipe)
fc6373a
verified

icarus112 commited on

feat: add ppl to per-step log + MID_VAL every 500 steps for learnability visibility
565fb9e
verified

icarus112 commited on

perf(htm): T9 input-tile memcpy_async to cluster smem (bandwidth reduction)
09aa6be
verified

icarus112 commited on

fix(htm): add cluster.sync between Stage A writes and next-timestep reads (T8 bimodal fix)
b4f024c
verified

icarus112 commited on

feat(triton-cache): wire setup/teardown into entrypoint
6ef2d06
verified

icarus112 commited on

feat(triton-cache): HF Hub-backed compilation cache persistence
313e3b0
verified

icarus112 commited on

perf(htm): cluster distributed shared memory for inhib_thr/boost/active_duty (T8)
321ed28
verified

icarus112 commited on

fix(htm): slice regions[:B] to handle eval batch size differing from training
230b3b5
verified

icarus112 commited on

fix(htm): set NON_PORTABLE_CLUSTER_SIZE_ALLOWED=1 for cluster_size=16
19dee83
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
278f184
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
73e6160
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
7721a60
verified

icarus112 commited on

perf(htm): Hopper cluster::sync hardware barrier + sm_90a + cluster launch attr
9b51027
verified

icarus112 commited on

perf(htm): DLB software grid barrier + non-cooperative launch (lifts 132-SM cap)
5c751ce
verified

icarus112 commited on

perf(htm): DLB software grid barrier + non-cooperative launch (lifts 132-SM cap)
d09ca5e
verified

icarus112 commited on

fix(htm): drop py.allow_threads (raw ptrs not Send)
783ed8e
verified

icarus112 commited on

perf(htm): batched cooperative kernel β€” B=8 regions in ONE launch via blockIdx.y indexing
20577da
verified

icarus112 commited on

perf(htm): batched cooperative kernel β€” B=8 regions in ONE launch via blockIdx.y indexing
bd5981d
verified

icarus112 commited on

perf(htm): batched cooperative kernel β€” B=8 regions in ONE launch via blockIdx.y indexing
b4bfe98
verified

icarus112 commited on

perf(htm): batched cooperative kernel β€” B=8 regions in ONE launch via blockIdx.y indexing
772ee76
verified

icarus112 commited on

perf(htm): thread-pool dispatch of B regions concurrently + HTM_FUSED_GRID_CAP=16 for kernel concurrency
8ff9654
verified

icarus112 commited on

perf(htm): remove per-call dev.sync, cache SDR, device_sync once per step
4ef1558
verified

icarus112 commited on

perf(htm): remove per-call dev.sync, cache SDR, device_sync once per step
c9d5ed8
verified

icarus112 commited on

perf(htm): remove per-call dev.sync, cache SDR, device_sync once per step
020c87a
verified

icarus112 commited on

fix: dD_acc sum via axis=0 not scalar.reshape(1)
6b28ba2
verified

icarus112 commited on

patch: 7 triton 3.4 compat fixes (3 tl.sum + 3 bwd desc.dtype + 3 fwd desc.dtype)
8bec152
verified

icarus112 commited on

patch: add bf16 casts on TMA descriptor stores (triton 3.4 strict dtype match)
ec7cc74
verified

icarus112 commited on

compat: triton 3.4 M,N,K>=16 patches for mamba3 SISO kernels (tl.dot -> tl.sum for sum-reductions)
4b6a863
verified

icarus112 commited on