Commit History

v7.1 training: update header, base config BACKPROP_K default
b002266
verified

krystv commited on

v7.1: chunked CE loss to fix OOM on base config, T4-friendly base defaults
02bcdde
verified

krystv commited on

v7 config: add backprop_depth, use_rope
3ab1689
verified

krystv commited on

v7: fix LR schedule, backprop depth k=4, RoPE, eps=1e-8
d3f6caa
verified

krystv commited on

v7 training: fix LR schedule mismatch, k=4 backprop, eps=1e-8
17f4238
verified

krystv commited on

v6: Replace CausalConv with standard causal attention (Huginn-style), sandwich norm, proper state injection"
955db6d
verified

krystv commited on

v5 training: LR=5e-5, bf16 on T4, eps=1e-15, NaN detection+recovery, wd=0.1
8e5f40b
verified

krystv commited on

v5: Fix all 5 training instabilities — inter-iteration norm, lower LR, z-loss, bf16 on T4, eps=1e-15"
ccaa938
verified

krystv commited on

Update ARCHITECTURE.md for v4.1 depth-recurrent design
f9939a9
verified

krystv commited on

Update config.json for v4.1 architecture
6f2fffc
verified

krystv commited on

Complete README rewrite for v4.1 — accurate architecture, all env vars, training guide
08298f1
verified

krystv commited on

v4.1 model: fix gated residual (was blocking gradient flow), add proper pre-norm residual pattern
7599d9a
verified

krystv commited on

v4.1 Colab notebook: fix total_memory, add SEQ_LEN/SAVE_EVERY/EVAL_EVERY, correct architecture description
733bb04
verified

krystv commited on

v4.1: Add periodic sample generation, fix notebook (total_memory, SEQ_LEN, architecture desc), add EVAL/SAVE env vars
5533fcb
verified

krystv commited on

v4 training script: match new architecture API, proper depth recurrence training
1d39647
verified

krystv commited on

v4: Complete redesign — proper depth recurrence with transposed MLP (TRM-style), not broken sequence recurrence"
b1a7c04
verified

krystv commited on

Fix scheduler warning: initial step before loop, keep optimizer step order correct
0d24fd2
verified

krystv commited on

Fix NaN: entire recurrence in float32 (disable autocast), numerically stable formulation, fix scheduler ordering
9f76025
verified

krystv commited on

Major fix: replace sequential WKV loop with chunked parallel processing, fix T4 OOM and hang"
9668e25
verified

krystv commited on

Fix train defaults for T4: BS=4, seq=256, N_sup=2, add PYTORCH_CUDA_ALLOC_CONF
99fcf17
verified

krystv commited on

Fix T4 OOM: gradient checkpointing, disable autocast in WKV loop, reduce seq_len default to 256, explicit cache clearing
e84ca44
verified

krystv commited on

Fix OOM on T4: reduce defaults (BS=4,N_sup=2), fix trackio API (dir→project), remove trust_remote_code"
5d6d175
verified

krystv commited on

Fix: proper AMP for T4(fp16)/A100(bf16)/CPU(fp32), GradScaler only with fp16
353b787
verified

krystv commited on

Fix: RMSNorm compat (PyTorch <2.4), dtype handling, GradScaler logic, WKV mixed precision
02f82bd
verified

krystv commited on

Fix: total_mem → total_memory (PyTorch API fix)
97567a2
verified

krystv commited on

Add detailed architecture documentation
1b2cbfc
verified

krystv commited on

Add comprehensive README with architecture documentation
95de54a
verified

krystv commited on

Add model config
1508cd8
verified

krystv commited on

Add Colab/Kaggle training notebook
6b672ca
verified

krystv commited on

Add training script (Colab/Kaggle compatible)
82a62b6
verified

krystv commited on

Add NEXUS architecture core module
0fde7b2
verified

krystv commited on

initial commit
ff42b79
verified

krystv commited on