File size: 6,443 Bytes
343aa9f 78de0aa 343aa9f 8ab7feb e8bddc9 343aa9f e8bddc9 242b971 e8bddc9 343aa9f e8bddc9 343aa9f e8bddc9 343aa9f e8bddc9 343aa9f e8bddc9 343aa9f e8bddc9 343aa9f e8bddc9 343aa9f e8bddc9 74fc904 4a39971 74fc904 4a39971 4101d6e 059a03e e8bddc9 343aa9f 4a39971 4101d6e 4a39971 4101d6e 4a39971 059a03e 4a39971 4101d6e 74fc904 4a39971 74fc904 4a39971 360b760 4a39971 4101d6e 4a39971 4101d6e 059a03e 4a39971 4101d6e 4a39971 4101d6e 4a39971 4101d6e 360b760 4a39971 360b760 059a03e 4a39971 4101d6e 74fc904 4a39971 35b06c4 4a39971 35b06c4 4a39971 059a03e 4a39971 4101d6e 74fc904 4a39971 059a03e 4a39971 74fc904 4a39971 74fc904 4a39971 74fc904 c2746f3 74fc904 e8bddc9 6f0e548 e8bddc9 343aa9f ab5d6e8 74fc904 a3c06d6 343aa9f e8bddc9 78de0aa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
---
license: gpl-3.0
---
JiRack_GPT3 is not Open AI model . It is class GPT-3 model
# Model Architecture Overview
## Architectures Included
I have added my empty models based on the following architectures:
- **GPT-3 Standard**
- **Llama 3**
- **Mistral**
For smaller models modeled after **GPT-2**, I utilize `LayerNorm` and `FFN` layers. For larger models, these layers are replaced with `RMSNorm` and `SwiGLU`, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B).
---
## Tokenizer Choices
- For English models: **GPT-2 Hugging Face tokenizer**
- For multilingual models: **BERT tokenizer** from the Hugging Face library
---
## Training and Tuning
The **Transformer block is not frozen**, providing greater flexibility and power when tuning models from scratch.
---
## Model Architecture Details
### GPT-2 Architecture (Classic, Transformer-like)
```
CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
βββ MultiHeadAttention
βββ LayerNorm
βββ LayerNorm
βββ FFN
βββ Linear
βββ Activation: GELU
βββ Linear
LayerNorm
Linear
```
---
### GPT-3 Architecture (Similar to Llama 3 & Mistral)
```
CustomEmbedding
# Positional Embedding removed, RoPE integrated in Attention
[TransformerBlock]
βββ MultiHeadAttention
βββ SwiGLUFeedForward
βββ Linear (Gate Layer)
βββ Linear (Up Layer)
βββ Linear (Projection/Down Layer)
βββ RMSNorm
RMSNorm
Linear
FrozenSignatureLayer
```
My LLMs
# ========================================================
# Model Configuration (1B-class model)
# ========================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 2048
- NUM_HEADS = 32
- NUM_LAYERS = 16
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Non-standard FFN (4D)
- HEAD_DIM = MODEL_DIM // NUM_HEADS #64
- EPSILON = 1e-6
---
# ============================================
# Model Configuration (31B-class model)
# ============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# =============================================
# Model Configuration (8B-class model)
# =============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096 # Increased for 8.5B-class (Standard, High-Efficiency)
- NUM_HEADS = 32
- NUM_LAYERS = 40 # Increased to 40 (same as Llama 13B)
- MAX_SEQ_LEN = 2048
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) # 10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# ==============================================
# Model Configuration (10B-class model)
# =================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096
- NUM_HEADS = 32
- NUM_LAYERS = 48 # Increased depth
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) #10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# =====================================================================================
# Model Configuration (33B-class model) that is available by request
# ===========================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# ====================================================================================
# 70B-Class Model Configuration (LLaMA-70B style) that available by request
# ====================================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Hidden size (d_model)
- NUM_HEADS = 64 # Q Heads
- NUM_KV_HEADS = 8 # KV Heads (GQA ratio = 8)
- NUM_LAYERS = 80 # 80 layers
- MAX_SEQ_LEN = 8192 # Max context (RoPE)
- # FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672)
- # Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B)
- # Using the standard LLaMA-70B FFN for accuracy
- FFN_HIDDEN_DIM = 28672
- HEAD_DIM = MODEL_DIM // NUM_HEADS
- EPSILON = 1e-6
---
#
# JiRack Super Brain
# It was Designed military design and Discover worlds and learn space and science goals
#
# ====================================================================================
#140B Configuration (real numbers) that is available by request, JiRack Super Brain
# ====================================================================================
- VOCAB_SIZE = 32000
- MODEL_DIM = 12288 # d_model
- NUM_HEADS = 96 # Query heads
- NUM_KV_HEADS = 12 # GQA: 8Γ groups
- NUM_LAYERS = 80
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) #53248
- MAX_SEQ_LEN = 131072 # Max context
- EPSILON = 1e-6
- So About PyTorch script . You can use Pytorch script for AI classification task .
- Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks
**Note:** The large model architectures replace specific layers:
- `LayerNorm` β `RMSNorm`
- `FFN` β `SwiGLU`
---
### JiRack RAG System
- It is microservice architecture with API Gateway and Service Discovery
- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt
- video https://www.youtube.com/watch?v=vHClQu76kMc
- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/
---
# install tokenizer before run
---
- mkdir -p tokenizer
- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json
Welcome to ask to design your corp model over 33B or 70B or more parameters
CMS Manhattan
Copyright Β© 2002β2026 |