kgrabko commited on
Commit
74fc904
·
verified ·
1 Parent(s): 5677ea6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md CHANGED
@@ -67,14 +67,106 @@ Linear
67
  FrozenSignatureLayer
68
  ```
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ---
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  **Note:** The large model architectures replace specific layers:
73
  - `LayerNorm` → `RMSNorm`
74
  - `FFN` → `SwiGLU`
75
 
76
  ---
77
 
 
 
78
  Welcome to ask to design your corp model over 33B or 70B or more parameters
79
 
80
  CMS Manhattan
 
67
  FrozenSignatureLayer
68
  ```
69
 
70
+ My LLMs
71
+
72
+ # ========================================
73
+ # Model Configuration (1B-class model)
74
+ # ========================================
75
+ VOCAB_SIZE = 50257
76
+ MODEL_DIM = 2048
77
+ NUM_HEADS = 32
78
+ NUM_LAYERS = 16
79
+ MAX_SEQ_LEN = 2048
80
+ # POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
81
+ FFN_HIDDEN_DIM = int(MODEL_DIM * 4)
82
+ HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
83
  ---
84
 
85
+ # ========================================
86
+ # Model Configuration 31B-class model)
87
+ # ========================================
88
+ VOCAB_SIZE = 50257
89
+ MODEL_DIM = 2560
90
+ NUM_HEADS = 32
91
+ NUM_LAYERS = 32
92
+ MAX_SEQ_LEN = 2048
93
+ # POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
94
+ FFN_HIDDEN_DIM = int(MODEL_DIM * 4)
95
+ HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
96
+
97
+ ---
98
+
99
+ # ========================================
100
+ # Model Configuration (8B-class model)
101
+ # ========================================
102
+ VOCAB_SIZE = 50257
103
+ MODEL_DIM = 2048
104
+ NUM_HEADS = 32
105
+ NUM_LAYERS = 24
106
+ MAX_SEQ_LEN = 2048
107
+ # POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
108
+ FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3)
109
+ HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
110
+
111
+ ---
112
+
113
+ # =====================================================================
114
+ # Model Configuration (33B-class model) that available by request , 135 Gb
115
+ # =====================================================================
116
+ VOCAB_SIZE = 50257
117
+ MODEL_DIM = 8192
118
+ NUM_HEADS = 64
119
+ NUM_LAYERS = 32
120
+ MAX_SEQ_LEN = 8192
121
+ POS_EMB_MAX_LEN = 32768
122
+ FFN_HIDDEN_DIM = 4 * MODEL_DIM
123
+ HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
124
+
125
+ ---
126
+
127
+ # =======================================================================
128
+ # 70B-Class Model Configuration (LLaMA-70B style) that available by request
129
+ # =======================================================================
130
+ VOCAB_SIZE = 50257
131
+ MODEL_DIM = 8192 # Hidden size (d_model)
132
+ NUM_HEADS = 64 # Attention heads → head_dim = 128
133
+ NUM_KV_HEADS = 8 # GQA: 8 KV heads (like LLaMA-70B), 64 Q heads
134
+ NUM_LAYERS = 80 # 80 layers → ~71B params
135
+ MAX_SEQ_LEN = 8192 # Training context
136
+ POS_EMB_MAX_LEN = 32768 # Safe for long generation
137
+ FFN_HIDDEN_DIM = 32768 # 4 × MODEL_DIM (32,768) → matches LLaMA-70B exactly
138
+ HEAD_DIM = MODEL_DIM // NUM_HEADS
139
+
140
+ ---
141
+ #
142
+ # JiRack Super Brain
143
+ # It was Designed military design and Discover worlds and learn space and science goals
144
+ #
145
+ # =======================================================================
146
+ # 120B Configuration (real numbers) that available by request , 135 Gb , JiRack Super Brain
147
+ # =======================================================================
148
+ VOCAB_SIZE = 32000 # Modern tokenizer size (you can change later)
149
+ MODEL_DIM = 12288 # d_model = 12288 → matches 120B+ scale
150
+ NUM_HEADS = 96 # Query heads
151
+ NUM_KV_HEADS = 12 # GQA: 8× groups (12 KV heads → 96/12 = 8)
152
+ NUM_LAYERS = 80 # 80 layers
153
+ HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
154
+ FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) # ~4.3× expansion (DeepSeek/Qwen style) → 53248
155
+ MAX_SEQ_LEN = 131072 # Training on 128k context
156
+ POS_EMB_MAX_LEN = 262144 # Generation up to 256k+ tokens safely
157
+
158
+
159
+
160
+
161
+
162
  **Note:** The large model architectures replace specific layers:
163
  - `LayerNorm` → `RMSNorm`
164
  - `FFN` → `SwiGLU`
165
 
166
  ---
167
 
168
+
169
+
170
  Welcome to ask to design your corp model over 33B or 70B or more parameters
171
 
172
  CMS Manhattan