Update README.md
Browse files
README.md
CHANGED
|
@@ -13,28 +13,4 @@ tags:
|
|
| 13 |
|
| 14 |
# 🌊 Overflow-1T
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
By utilizing the **Overflow** architecture, this model achieves massive scale reasoning while remaining computationally efficient, designed specifically to run on consumer-grade hardware through advanced weight packing and specialized C++ inference kernels.
|
| 19 |
-
|
| 20 |
-
## 🚀 Key Specifications
|
| 21 |
-
* **Parameters:** 1,000,000,000,000 (1T)
|
| 22 |
-
* **Precision:** 1.5-bit Ternary (packed 5-weights-per-byte)
|
| 23 |
-
* **Architecture:** OverflowForCausalLM
|
| 24 |
-
* **Layers:** 128
|
| 25 |
-
* **Hidden Size:** 16,384
|
| 26 |
-
* **Attention:** Grouped Query Attention (GQA) with 16 KV heads
|
| 27 |
-
* **Format:** `.safetensors`
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
## 🛠 Project Status: Initial Sharding
|
| 32 |
-
We are currently in the process of sharding the 1.5-bit weights to the Hugging Face Hub.
|
| 33 |
-
- **Progress:** Shard 2 of 10 currently uploading.
|
| 34 |
-
- **Estimated Completion:** March 2026.
|
| 35 |
-
|
| 36 |
-
## 🧠 Why 1.5-bit?
|
| 37 |
-
Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
|
| 38 |
-
|
| 39 |
-
---
|
| 40 |
-
**Created by CooLLaMACEO**
|
|
|
|
| 13 |
|
| 14 |
# 🌊 Overflow-1T
|
| 15 |
|
| 16 |
+
### Discontinued (Maybe later will continue or upload on a different account I am working on the model.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|