Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab 9ef2709 verified rcgalbo commited on Mar 18
Update model card with full architecture and training details 051c2da verified rcgalbo commited on Mar 16
Upload pruned Aetheris (536M params, 80K vocab, 25.7% smaller) 2f57a9e verified rcgalbo commited on Mar 13
Upload Aetheris model with source code (Stage 2, 722M params, loss=2.73) 3bfe5e4 verified rcgalbo commited on Mar 13
Upload Aetheris model (Stage 2 best, 722M params, loss=2.73) 8b21693 verified rcgalbo commited on Mar 13
Upload final Stage 2 best checkpoint (loss=2.7305, 20K steps) d420efe verified rcgalbo commited on Mar 13
Stage 1 complete: 10K steps, CKA layer alignment final checkpoint ba9c5e2 verified rcgalbo commited on Mar 12
Stage 1 checkpoint: [Step 9100/10000] loss=0.0141 cka_mean=0.1489 8f808fd verified rcgalbo commited on Mar 12
Stage 1 checkpoint: [Step 8400/10000] loss=0.0131 cka_mean=0.1052 f5ddb2a verified rcgalbo commited on Mar 12
Stage 1 checkpoint: [Step 7100/10000] loss=0.0161 cka_mean=0.1370 eb32e03 verified rcgalbo commited on Mar 12
Stage 1 checkpoint: [Step 6400/10000] loss=0.0199 cka_mean=0.3752 7a10927 verified rcgalbo commited on Mar 12