Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ By utilizing the **Overflow** architecture, this model achieves massive scale re
|
|
| 24 |
* **Layers:** 128
|
| 25 |
* **Hidden Size:** 16,384
|
| 26 |
* **Attention:** Grouped Query Attention (GQA) with 16 KV heads
|
| 27 |
-
* **Format:** `.safetensors`
|
| 28 |
|
| 29 |
|
| 30 |
|
|
@@ -36,9 +36,5 @@ We are currently in the process of sharding the 1.5-bit weights to the Hugging F
|
|
| 36 |
## 🧠 Why 1.5-bit?
|
| 37 |
Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
|
| 38 |
|
| 39 |
-
## 💻 Inference
|
| 40 |
-
This model is designed to be served using the **1TSumerGPU** engine, a custom C++ and CUDA-based inference framework optimized for NVIDIA RTX 40-series GPUs.
|
| 41 |
-
|
| 42 |
---
|
| 43 |
-
**Created by CooLLaMACEO**
|
| 44 |
-
*Part of the Kwen Foundation initiatives.*
|
|
|
|
| 24 |
* **Layers:** 128
|
| 25 |
* **Hidden Size:** 16,384
|
| 26 |
* **Attention:** Grouped Query Attention (GQA) with 16 KV heads
|
| 27 |
+
* **Format:** `.safetensors`
|
| 28 |
|
| 29 |
|
| 30 |
|
|
|
|
| 36 |
## 🧠 Why 1.5-bit?
|
| 37 |
Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
|
| 38 |
|
|
|
|
|
|
|
|
|
|
| 39 |
---
|
| 40 |
+
**Created by CooLLaMACEO**
|
|
|