pomilon-lab
/

Aetheris-MoE-300M-A125M-base

@@ -44,8 +44,8 @@ Because of the hybrid design, ~43% of the model is "dormant" during inference.
 I am currently training this on a single NVIDIA RTX 5000. It's still cooking!
-* **Current Checkpoint:** Step 10,000 (Early Convergence)
-* **Loss:** ~3.66
 * **Dataset:** Subset of SlimPajama-627B
 > **⚠️ Disclaimer:** This model is currently babbling coherent English but isn't very smart yet. Don't expect GPT-4 (or even GPT-2) level reasoning. It's a proof-of-concept for the code, not the weights! :D
@@ -79,4 +79,4 @@ This project stands on the shoulders of giants. It is an implementation study ba
 ## License
-MIT

 I am currently training this on a single NVIDIA RTX 5000. It's still cooking!
+* **Latest Checkpoint:** Step 11,000
+* **Loss:** ~1.4167
 * **Dataset:** Subset of SlimPajama-627B
 > **⚠️ Disclaimer:** This model is currently babbling coherent English but isn't very smart yet. Don't expect GPT-4 (or even GPT-2) level reasoning. It's a proof-of-concept for the code, not the weights! :D
 ## License
+MIT

checkpoints/checkpoint_11000_step.pth ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c0f13d26bdb5729bef04585efe20a8b96a27c3aa0dd4ad9f5a6b8a6f0fdc497f
+size 3533562641