Upload folder using huggingface_hub
Browse files- README.md +3 -3
- checkpoints/checkpoint_11000_step.pth +3 -0
README.md
CHANGED
|
@@ -44,8 +44,8 @@ Because of the hybrid design, ~43% of the model is "dormant" during inference.
|
|
| 44 |
|
| 45 |
I am currently training this on a single NVIDIA RTX 5000. It's still cooking!
|
| 46 |
|
| 47 |
-
* **
|
| 48 |
-
* **Loss:** ~
|
| 49 |
* **Dataset:** Subset of SlimPajama-627B
|
| 50 |
|
| 51 |
> **⚠️ Disclaimer:** This model is currently babbling coherent English but isn't very smart yet. Don't expect GPT-4 (or even GPT-2) level reasoning. It's a proof-of-concept for the code, not the weights! :D
|
|
@@ -79,4 +79,4 @@ This project stands on the shoulders of giants. It is an implementation study ba
|
|
| 79 |
|
| 80 |
## License
|
| 81 |
|
| 82 |
-
MIT
|
|
|
|
| 44 |
|
| 45 |
I am currently training this on a single NVIDIA RTX 5000. It's still cooking!
|
| 46 |
|
| 47 |
+
* **Latest Checkpoint:** Step 11,000
|
| 48 |
+
* **Loss:** ~1.4167
|
| 49 |
* **Dataset:** Subset of SlimPajama-627B
|
| 50 |
|
| 51 |
> **⚠️ Disclaimer:** This model is currently babbling coherent English but isn't very smart yet. Don't expect GPT-4 (or even GPT-2) level reasoning. It's a proof-of-concept for the code, not the weights! :D
|
|
|
|
| 79 |
|
| 80 |
## License
|
| 81 |
|
| 82 |
+
MIT
|
checkpoints/checkpoint_11000_step.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c0f13d26bdb5729bef04585efe20a8b96a27c3aa0dd4ad9f5a6b8a6f0fdc497f
|
| 3 |
+
size 3533562641
|