Update README.md
Browse files
README.md
CHANGED
|
@@ -67,14 +67,106 @@ Linear
|
|
| 67 |
FrozenSignatureLayer
|
| 68 |
```
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
---
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
**Note:** The large model architectures replace specific layers:
|
| 73 |
- `LayerNorm` → `RMSNorm`
|
| 74 |
- `FFN` → `SwiGLU`
|
| 75 |
|
| 76 |
---
|
| 77 |
|
|
|
|
|
|
|
| 78 |
Welcome to ask to design your corp model over 33B or 70B or more parameters
|
| 79 |
|
| 80 |
CMS Manhattan
|
|
|
|
| 67 |
FrozenSignatureLayer
|
| 68 |
```
|
| 69 |
|
| 70 |
+
My LLMs
|
| 71 |
+
|
| 72 |
+
# ========================================
|
| 73 |
+
# Model Configuration (1B-class model)
|
| 74 |
+
# ========================================
|
| 75 |
+
VOCAB_SIZE = 50257
|
| 76 |
+
MODEL_DIM = 2048
|
| 77 |
+
NUM_HEADS = 32
|
| 78 |
+
NUM_LAYERS = 16
|
| 79 |
+
MAX_SEQ_LEN = 2048
|
| 80 |
+
# POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
|
| 81 |
+
FFN_HIDDEN_DIM = int(MODEL_DIM * 4)
|
| 82 |
+
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
|
| 83 |
---
|
| 84 |
|
| 85 |
+
# ========================================
|
| 86 |
+
# Model Configuration 31B-class model)
|
| 87 |
+
# ========================================
|
| 88 |
+
VOCAB_SIZE = 50257
|
| 89 |
+
MODEL_DIM = 2560
|
| 90 |
+
NUM_HEADS = 32
|
| 91 |
+
NUM_LAYERS = 32
|
| 92 |
+
MAX_SEQ_LEN = 2048
|
| 93 |
+
# POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
|
| 94 |
+
FFN_HIDDEN_DIM = int(MODEL_DIM * 4)
|
| 95 |
+
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
# ========================================
|
| 100 |
+
# Model Configuration (8B-class model)
|
| 101 |
+
# ========================================
|
| 102 |
+
VOCAB_SIZE = 50257
|
| 103 |
+
MODEL_DIM = 2048
|
| 104 |
+
NUM_HEADS = 32
|
| 105 |
+
NUM_LAYERS = 24
|
| 106 |
+
MAX_SEQ_LEN = 2048
|
| 107 |
+
# POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
|
| 108 |
+
FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3)
|
| 109 |
+
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
# =====================================================================
|
| 114 |
+
# Model Configuration (33B-class model) that available by request , 135 Gb
|
| 115 |
+
# =====================================================================
|
| 116 |
+
VOCAB_SIZE = 50257
|
| 117 |
+
MODEL_DIM = 8192
|
| 118 |
+
NUM_HEADS = 64
|
| 119 |
+
NUM_LAYERS = 32
|
| 120 |
+
MAX_SEQ_LEN = 8192
|
| 121 |
+
POS_EMB_MAX_LEN = 32768
|
| 122 |
+
FFN_HIDDEN_DIM = 4 * MODEL_DIM
|
| 123 |
+
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
# =======================================================================
|
| 128 |
+
# 70B-Class Model Configuration (LLaMA-70B style) that available by request
|
| 129 |
+
# =======================================================================
|
| 130 |
+
VOCAB_SIZE = 50257
|
| 131 |
+
MODEL_DIM = 8192 # Hidden size (d_model)
|
| 132 |
+
NUM_HEADS = 64 # Attention heads → head_dim = 128
|
| 133 |
+
NUM_KV_HEADS = 8 # GQA: 8 KV heads (like LLaMA-70B), 64 Q heads
|
| 134 |
+
NUM_LAYERS = 80 # 80 layers → ~71B params
|
| 135 |
+
MAX_SEQ_LEN = 8192 # Training context
|
| 136 |
+
POS_EMB_MAX_LEN = 32768 # Safe for long generation
|
| 137 |
+
FFN_HIDDEN_DIM = 32768 # 4 × MODEL_DIM (32,768) → matches LLaMA-70B exactly
|
| 138 |
+
HEAD_DIM = MODEL_DIM // NUM_HEADS
|
| 139 |
+
|
| 140 |
+
---
|
| 141 |
+
#
|
| 142 |
+
# JiRack Super Brain
|
| 143 |
+
# It was Designed military design and Discover worlds and learn space and science goals
|
| 144 |
+
#
|
| 145 |
+
# =======================================================================
|
| 146 |
+
# 120B Configuration (real numbers) that available by request , 135 Gb , JiRack Super Brain
|
| 147 |
+
# =======================================================================
|
| 148 |
+
VOCAB_SIZE = 32000 # Modern tokenizer size (you can change later)
|
| 149 |
+
MODEL_DIM = 12288 # d_model = 12288 → matches 120B+ scale
|
| 150 |
+
NUM_HEADS = 96 # Query heads
|
| 151 |
+
NUM_KV_HEADS = 12 # GQA: 8× groups (12 KV heads → 96/12 = 8)
|
| 152 |
+
NUM_LAYERS = 80 # 80 layers
|
| 153 |
+
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
|
| 154 |
+
FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) # ~4.3× expansion (DeepSeek/Qwen style) → 53248
|
| 155 |
+
MAX_SEQ_LEN = 131072 # Training on 128k context
|
| 156 |
+
POS_EMB_MAX_LEN = 262144 # Generation up to 256k+ tokens safely
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
|
| 162 |
**Note:** The large model architectures replace specific layers:
|
| 163 |
- `LayerNorm` → `RMSNorm`
|
| 164 |
- `FFN` → `SwiGLU`
|
| 165 |
|
| 166 |
---
|
| 167 |
|
| 168 |
+
|
| 169 |
+
|
| 170 |
Welcome to ask to design your corp model over 33B or 70B or more parameters
|
| 171 |
|
| 172 |
CMS Manhattan
|