Upload ascii-chart5-L4-D768-mkii-c1932d6c-1962-493d-b0b7-78e84e30e4e5.txt with huggingface_hub 4adce2e verified SQCU commited on Nov 20, 2025
Upload ascii-eos-L4-D768-rollout-test-01661b5f-2ff7-49b1-9ad8-fee77e14bd1c.txt with huggingface_hub 6a81277 verified SQCU commited on Nov 20, 2025
compiled models train faster so you can train more of them in a short experiment, to better convergence. 921107d verified SQCU commited on Feb 3, 2025
89,301,000 parameter attention_ii, z_lossed model trained for 6250 steps at batchsize:4*32, device_batchsize:32 8a69386 verified SQCU commited on Feb 1, 2025
sling the illustrious and mysterious "attention_II" models. also some layerwise rmsnorm, qkprojection rmsnorm models, one twice as large as the other. 1f45909 verified SQCU commited on Feb 1, 2025