v7.1 training: update header, base config BACKPROP_K default b002266 verified krystv commited on Apr 27
v7.1: chunked CE loss to fix OOM on base config, T4-friendly base defaults 02bcdde verified krystv commited on Apr 27
v7 training: fix LR schedule mismatch, k=4 backprop, eps=1e-8 17f4238 verified krystv commited on Apr 27
v6: Replace CausalConv with standard causal attention (Huginn-style), sandwich norm, proper state injection" 955db6d verified krystv commited on Apr 27
v5 training: LR=5e-5, bf16 on T4, eps=1e-15, NaN detection+recovery, wd=0.1 8e5f40b verified krystv commited on Apr 27
v5: Fix all 5 training instabilities — inter-iteration norm, lower LR, z-loss, bf16 on T4, eps=1e-15" ccaa938 verified krystv commited on Apr 27
Complete README rewrite for v4.1 — accurate architecture, all env vars, training guide 08298f1 verified krystv commited on Apr 26
v4.1 model: fix gated residual (was blocking gradient flow), add proper pre-norm residual pattern 7599d9a verified krystv commited on Apr 26
v4.1 Colab notebook: fix total_memory, add SEQ_LEN/SAVE_EVERY/EVAL_EVERY, correct architecture description 733bb04 verified krystv commited on Apr 26
v4.1: Add periodic sample generation, fix notebook (total_memory, SEQ_LEN, architecture desc), add EVAL/SAVE env vars 5533fcb verified krystv commited on Apr 26
v4 training script: match new architecture API, proper depth recurrence training 1d39647 verified krystv commited on Apr 26
v4: Complete redesign — proper depth recurrence with transposed MLP (TRM-style), not broken sequence recurrence" b1a7c04 verified krystv commited on Apr 26
Fix scheduler warning: initial step before loop, keep optimizer step order correct 0d24fd2 verified krystv commited on Apr 26
Fix NaN: entire recurrence in float32 (disable autocast), numerically stable formulation, fix scheduler ordering 9f76025 verified krystv commited on Apr 26
Major fix: replace sequential WKV loop with chunked parallel processing, fix T4 OOM and hang" 9668e25 verified krystv commited on Apr 26
Fix train defaults for T4: BS=4, seq=256, N_sup=2, add PYTORCH_CUDA_ALLOC_CONF 99fcf17 verified krystv commited on Apr 26
Fix T4 OOM: gradient checkpointing, disable autocast in WKV loop, reduce seq_len default to 256, explicit cache clearing e84ca44 verified krystv commited on Apr 26
Fix OOM on T4: reduce defaults (BS=4,N_sup=2), fix trackio API (dir→project), remove trust_remote_code" 5d6d175 verified krystv commited on Apr 26
Fix: proper AMP for T4(fp16)/A100(bf16)/CPU(fp32), GradScaler only with fp16 353b787 verified krystv commited on Apr 26
Fix: RMSNorm compat (PyTorch <2.4), dtype handling, GradScaler logic, WKV mixed precision 02f82bd verified krystv commited on Apr 26