CooLLaMACEO
/

Overflow-1T

Model card Files Files and versions

CooLLaMACEO commited on Mar 9

Commit

e894826

·

verified ·

1 Parent(s): c516e70

Update README.md

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -4,6 +4,41 @@ language:
 - en
 tags:
 - agent
 ---
-# Coming Soon.. (:<

 - en
 tags:
 - agent
+- ternary
+- 1.5-bit
+- overflow
+- large-scale
+- efficiency
 ---
+# 🌊 Overflow-1T
+**Overflow-1T** is a next-generation, **1.03 Trillion parameter** Large Language Model built on a custom **1.5-bit Ternary ({-1, 0, 1}) architecture**.
+By utilizing the **Overflow** architecture, this model achieves massive scale reasoning while remaining computationally efficient, designed specifically to run on consumer-grade hardware through advanced weight packing and specialized C++ inference kernels.
+## 🚀 Key Specifications
+* **Parameters:** 1,000,000,000,000 (1T)
+* **Precision:** 1.5-bit Ternary (packed 5-weights-per-byte)
+* **Architecture:** OverflowForCausalLM
+* **Layers:** 128
+* **Hidden Size:** 16,384
+* **Attention:** Grouped Query Attention (GQA) with 16 KV heads
+* **Format:** `.safetensors` / `.bbuf` (Optimized for 1TSumerGPU)
+## 🛠 Project Status: Initial Sharding
+We are currently in the process of sharding the 1.5-bit weights to the Hugging Face Hub.
+- **Progress:** Shard 1 of 10 currently uploading.
+- **Estimated Completion:** March 2026.
+## 🧠 Why 1.5-bit?
+Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
+## 💻 Inference
+This model is designed to be served using the **1TSumerGPU** engine, a custom C++ and CUDA-based inference framework optimized for NVIDIA RTX 40-series GPUs.
+---
+**Created by CooLLaMACEO**
+*Part of the Kwen Foundation initiatives.*