Pomilon commited on
Commit
70f4e82
·
verified ·
1 Parent(s): 95b55b0

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -44,8 +44,8 @@ Because of the hybrid design, ~43% of the model is "dormant" during inference.
44
 
45
  I am currently training this on a single NVIDIA RTX 5000. It's still cooking!
46
 
47
- * **Current Checkpoint:** Step 10,000 (Early Convergence)
48
- * **Loss:** ~3.66
49
  * **Dataset:** Subset of SlimPajama-627B
50
 
51
  > **⚠️ Disclaimer:** This model is currently babbling coherent English but isn't very smart yet. Don't expect GPT-4 (or even GPT-2) level reasoning. It's a proof-of-concept for the code, not the weights! :D
@@ -79,4 +79,4 @@ This project stands on the shoulders of giants. It is an implementation study ba
79
 
80
  ## License
81
 
82
- MIT
 
44
 
45
  I am currently training this on a single NVIDIA RTX 5000. It's still cooking!
46
 
47
+ * **Latest Checkpoint:** Step 11,000
48
+ * **Loss:** ~1.4167
49
  * **Dataset:** Subset of SlimPajama-627B
50
 
51
  > **⚠️ Disclaimer:** This model is currently babbling coherent English but isn't very smart yet. Don't expect GPT-4 (or even GPT-2) level reasoning. It's a proof-of-concept for the code, not the weights! :D
 
79
 
80
  ## License
81
 
82
+ MIT
checkpoints/checkpoint_11000_step.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0f13d26bdb5729bef04585efe20a8b96a27c3aa0dd4ad9f5a6b8a6f0fdc497f
3
+ size 3533562641