CMSManhattan
/

JiRack_GPT3_empty

Model card Files Files and versions

xet

Community

kgrabko commited on Nov 30, 2025

Commit

74fc904

verified ·

1 Parent(s): 5677ea6

Update README.md

Browse files

Files changed (1) hide show

README.md +92 -0

README.md CHANGED Viewed

@@ -67,14 +67,106 @@ Linear
 FrozenSignatureLayer
 ```
 ---
 **Note:** The large model architectures replace specific layers:
 - `LayerNorm` → `RMSNorm`
 - `FFN` → `SwiGLU`
 ---
 Welcome to ask to design your corp model over 33B or 70B or more parameters
 CMS Manhattan

 FrozenSignatureLayer
 ```
+My LLMs
+# ========================================
+# Model Configuration (1B-class model)
+# ========================================
+VOCAB_SIZE = 50257
+MODEL_DIM = 2048
+NUM_HEADS = 32
+NUM_LAYERS = 16
+MAX_SEQ_LEN = 2048
+# POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
+FFN_HIDDEN_DIM = int(MODEL_DIM * 4)
+HEAD_DIM = MODEL_DIM // NUM_HEADS  # 128
 ---
+# ========================================
+# Model Configuration 31B-class model)
+# ========================================
+VOCAB_SIZE = 50257
+MODEL_DIM = 2560
+NUM_HEADS = 32
+NUM_LAYERS = 32
+MAX_SEQ_LEN = 2048
+# POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
+FFN_HIDDEN_DIM = int(MODEL_DIM * 4)
+HEAD_DIM = MODEL_DIM // NUM_HEADS  # 128
+---
+# ========================================
+# Model Configuration (8B-class model)
+# ========================================
+VOCAB_SIZE = 50257
+MODEL_DIM = 2048
+NUM_HEADS = 32
+NUM_LAYERS = 24
+MAX_SEQ_LEN = 2048
+# POS_EMB_MAX_LEN больше не используется, RoPE использует MAX_SEQ_LEN
+FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3)
+HEAD_DIM = MODEL_DIM // NUM_HEADS  # 128
+---
+# =====================================================================
+# Model Configuration (33B-class model) that available by request , 135 Gb
+# =====================================================================
+VOCAB_SIZE = 50257
+MODEL_DIM = 8192
+NUM_HEADS = 64
+NUM_LAYERS = 32
+MAX_SEQ_LEN = 8192
+POS_EMB_MAX_LEN = 32768
+FFN_HIDDEN_DIM = 4 * MODEL_DIM
+HEAD_DIM = MODEL_DIM // NUM_HEADS  # 128
+---
+# =======================================================================
+# 70B-Class Model Configuration (LLaMA-70B style) that available by request
+# =======================================================================
+VOCAB_SIZE      = 50257
+MODEL_DIM       = 8192          # Hidden size (d_model)
+NUM_HEADS       = 64            # Attention heads → head_dim = 128
+NUM_KV_HEADS    = 8             # GQA: 8 KV heads (like LLaMA-70B), 64 Q heads
+NUM_LAYERS      = 80            # 80 layers → ~71B params
+MAX_SEQ_LEN     = 8192          # Training context
+POS_EMB_MAX_LEN = 32768         # Safe for long generation
+FFN_HIDDEN_DIM  = 32768         # 4 × MODEL_DIM (32,768) → matches LLaMA-70B exactly
+HEAD_DIM        = MODEL_DIM // NUM_HEADS
+---
+#
+#  JiRack Super Brain
+#  It was Designed military design and Discover worlds and learn space and science goals
+#
+# =======================================================================
+# 120B Configuration (real numbers) that available by request , 135 Gb , JiRack Super Brain
+# =======================================================================
+VOCAB_SIZE       = 32000        # Modern tokenizer size (you can change later)
+MODEL_DIM        = 12288        # d_model = 12288 → matches 120B+ scale
+NUM_HEADS        = 96           # Query heads
+NUM_KV_HEADS     = 12           # GQA: 8× groups (12 KV heads → 96/12 = 8)
+NUM_LAYERS       = 80           # 80 layers
+HEAD_DIM         = MODEL_DIM // NUM_HEADS          # 128
+FFN_HIDDEN_DIM   = int(4 * MODEL_DIM * 1.3)        # ~4.3× expansion (DeepSeek/Qwen style) → 53248
+MAX_SEQ_LEN      = 131072       # Training on 128k context
+POS_EMB_MAX_LEN  = 262144       # Generation up to 256k+ tokens safely
 **Note:** The large model architectures replace specific layers:
 - `LayerNorm` → `RMSNorm`
 - `FFN` → `SwiGLU`
 ---
 Welcome to ask to design your corp model over 33B or 70B or more parameters
 CMS Manhattan