CooLLaMACEO
/

Overflow-1T

Model card Files Files and versions

CooLLaMACEO commited on Mar 9

Commit

c34f3a3

·

verified ·

1 Parent(s): e894826

Update README.md

Files changed (1) hide show

README.md +2 -6

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ By utilizing the **Overflow** architecture, this model achieves massive scale re
 * **Layers:** 128
 * **Hidden Size:** 16,384
 * **Attention:** Grouped Query Attention (GQA) with 16 KV heads
-* **Format:** `.safetensors` / `.bbuf` (Optimized for 1TSumerGPU)
@@ -36,9 +36,5 @@ We are currently in the process of sharding the 1.5-bit weights to the Hugging F
 ## 🧠 Why 1.5-bit?
 Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
-## 💻 Inference
-This model is designed to be served using the **1TSumerGPU** engine, a custom C++ and CUDA-based inference framework optimized for NVIDIA RTX 40-series GPUs.
 ---
-**Created by CooLLaMACEO**
-*Part of the Kwen Foundation initiatives.*

 * **Layers:** 128
 * **Hidden Size:** 16,384
 * **Attention:** Grouped Query Attention (GQA) with 16 KV heads
+* **Format:** `.safetensors`
 ## 🧠 Why 1.5-bit?
 Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
 ---
+**Created by CooLLaMACEO**