CooLLaMACEO commited on
Commit
c34f3a3
·
verified ·
1 Parent(s): e894826

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -24,7 +24,7 @@ By utilizing the **Overflow** architecture, this model achieves massive scale re
24
  * **Layers:** 128
25
  * **Hidden Size:** 16,384
26
  * **Attention:** Grouped Query Attention (GQA) with 16 KV heads
27
- * **Format:** `.safetensors` / `.bbuf` (Optimized for 1TSumerGPU)
28
 
29
 
30
 
@@ -36,9 +36,5 @@ We are currently in the process of sharding the 1.5-bit weights to the Hugging F
36
  ## 🧠 Why 1.5-bit?
37
  Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
38
 
39
- ## 💻 Inference
40
- This model is designed to be served using the **1TSumerGPU** engine, a custom C++ and CUDA-based inference framework optimized for NVIDIA RTX 40-series GPUs.
41
-
42
  ---
43
- **Created by CooLLaMACEO**
44
- *Part of the Kwen Foundation initiatives.*
 
24
  * **Layers:** 128
25
  * **Hidden Size:** 16,384
26
  * **Attention:** Grouped Query Attention (GQA) with 16 KV heads
27
+ * **Format:** `.safetensors`
28
 
29
 
30
 
 
36
  ## 🧠 Why 1.5-bit?
37
  Unlike standard 1-bit models, Overflow-1T utilizes a **0-state** (Neutral weight). This allows the model to effectively "silence" noise across its 1T parameter space, leading to significantly higher stability in Chain-of-Thought (CoT) reasoning and logic tasks compared to binary 1-bit models.
38
 
 
 
 
39
  ---
40
+ **Created by CooLLaMACEO**