|
|
--- |
|
|
license: gpl-3.0 |
|
|
--- |
|
|
|
|
|
JiRack_GPT3 is not Open AI model . It is class GPT-3 model |
|
|
|
|
|
# Model Architecture Overview |
|
|
|
|
|
## Architectures Included |
|
|
|
|
|
I have added my empty models based on the following architectures: |
|
|
|
|
|
- **GPT-3 Standard** |
|
|
- **Llama 3** |
|
|
- **Mistral** |
|
|
|
|
|
For smaller models modeled after **GPT-2**, I utilize `LayerNorm` and `FFN` layers. For larger models, these layers are replaced with `RMSNorm` and `SwiGLU`, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B). |
|
|
|
|
|
--- |
|
|
|
|
|
## Tokenizer Choices |
|
|
|
|
|
- For English models: **GPT-2 Hugging Face tokenizer** |
|
|
- For multilingual models: **BERT tokenizer** from the Hugging Face library |
|
|
|
|
|
--- |
|
|
|
|
|
## Training and Tuning |
|
|
|
|
|
The **Transformer block is not frozen**, providing greater flexibility and power when tuning models from scratch. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Architecture Details |
|
|
|
|
|
### GPT-2 Architecture (Classic, Transformer-like) |
|
|
|
|
|
``` |
|
|
CustomEmbedding |
|
|
FrozenSignatureLayer |
|
|
LearnedPositionalEmbedding |
|
|
[TransformerBlock] |
|
|
βββ MultiHeadAttention |
|
|
βββ LayerNorm |
|
|
βββ LayerNorm |
|
|
βββ FFN |
|
|
βββ Linear |
|
|
βββ Activation: GELU |
|
|
βββ Linear |
|
|
LayerNorm |
|
|
Linear |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### GPT-3 Architecture (Similar to Llama 3 & Mistral) |
|
|
|
|
|
``` |
|
|
CustomEmbedding |
|
|
# Positional Embedding removed, RoPE integrated in Attention |
|
|
[TransformerBlock] |
|
|
βββ MultiHeadAttention |
|
|
βββ SwiGLUFeedForward |
|
|
βββ Linear (Gate Layer) |
|
|
βββ Linear (Up Layer) |
|
|
βββ Linear (Projection/Down Layer) |
|
|
βββ RMSNorm |
|
|
RMSNorm |
|
|
Linear |
|
|
FrozenSignatureLayer |
|
|
``` |
|
|
|
|
|
My LLMs |
|
|
|
|
|
# ======================================================== |
|
|
# Model Configuration (1B-class model) |
|
|
# ======================================================== |
|
|
- VOCAB_SIZE = 50257 |
|
|
- MODEL_DIM = 2048 |
|
|
- NUM_HEADS = 32 |
|
|
- NUM_LAYERS = 16 |
|
|
- MAX_SEQ_LEN = 2048 |
|
|
- #RoPE |
|
|
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Non-standard FFN (4D) |
|
|
- HEAD_DIM = MODEL_DIM // NUM_HEADS #64 |
|
|
- EPSILON = 1e-6 |
|
|
--- |
|
|
|
|
|
# ============================================ |
|
|
# Model Configuration (31B-class model) |
|
|
# ============================================ |
|
|
- VOCAB_SIZE = 50257 |
|
|
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B) |
|
|
- NUM_HEADS = 64 |
|
|
- NUM_LAYERS = 32 |
|
|
- MAX_SEQ_LEN = 8192 # Large context length |
|
|
- # RoPE |
|
|
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768 |
|
|
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128 |
|
|
- EPSILON = 1e-6 |
|
|
|
|
|
--- |
|
|
|
|
|
# ============================================= |
|
|
# Model Configuration (8B-class model) |
|
|
# ============================================= |
|
|
- VOCAB_SIZE = 50257 |
|
|
- MODEL_DIM = 4096 # Increased for 8.5B-class (Standard, High-Efficiency) |
|
|
- NUM_HEADS = 32 |
|
|
- NUM_LAYERS = 40 # Increased to 40 (same as Llama 13B) |
|
|
- MAX_SEQ_LEN = 2048 |
|
|
- # RoPE |
|
|
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) # 10922 (Llama standard) |
|
|
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128 |
|
|
- EPSILON = 1e-6 |
|
|
|
|
|
--- |
|
|
|
|
|
# ============================================== |
|
|
# Model Configuration (10B-class model) |
|
|
# ================================================= |
|
|
- VOCAB_SIZE = 50257 |
|
|
- MODEL_DIM = 4096 |
|
|
- NUM_HEADS = 32 |
|
|
- NUM_LAYERS = 48 # Increased depth |
|
|
- MAX_SEQ_LEN = 2048 |
|
|
- #RoPE |
|
|
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) #10922 (Llama standard) |
|
|
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128 |
|
|
- EPSILON = 1e-6 |
|
|
|
|
|
--- |
|
|
|
|
|
# ===================================================================================== |
|
|
# Model Configuration (33B-class model) that is available by request |
|
|
# =========================================================================== |
|
|
- VOCAB_SIZE = 50257 |
|
|
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B) |
|
|
- NUM_HEADS = 64 |
|
|
- NUM_LAYERS = 32 |
|
|
- MAX_SEQ_LEN = 8192 # Large context length |
|
|
- # RoPE |
|
|
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768 |
|
|
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128 |
|
|
- EPSILON = 1e-6 |
|
|
|
|
|
--- |
|
|
|
|
|
# ==================================================================================== |
|
|
# 70B-Class Model Configuration (LLaMA-70B style) that available by request |
|
|
# ==================================================================================== |
|
|
- VOCAB_SIZE = 50257 |
|
|
- MODEL_DIM = 8192 # Hidden size (d_model) |
|
|
- NUM_HEADS = 64 # Q Heads |
|
|
- NUM_KV_HEADS = 8 # KV Heads (GQA ratio = 8) |
|
|
- NUM_LAYERS = 80 # 80 layers |
|
|
- MAX_SEQ_LEN = 8192 # Max context (RoPE) |
|
|
- # FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672) |
|
|
- # Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B) |
|
|
- # Using the standard LLaMA-70B FFN for accuracy |
|
|
- FFN_HIDDEN_DIM = 28672 |
|
|
- HEAD_DIM = MODEL_DIM // NUM_HEADS |
|
|
- EPSILON = 1e-6 |
|
|
|
|
|
--- |
|
|
# |
|
|
# JiRack Super Brain |
|
|
# It was Designed military design and Discover worlds and learn space and science goals |
|
|
# |
|
|
# ==================================================================================== |
|
|
#140B Configuration (real numbers) that is available by request, JiRack Super Brain |
|
|
# ==================================================================================== |
|
|
- VOCAB_SIZE = 32000 |
|
|
- MODEL_DIM = 12288 # d_model |
|
|
- NUM_HEADS = 96 # Query heads |
|
|
- NUM_KV_HEADS = 12 # GQA: 8Γ groups |
|
|
- NUM_LAYERS = 80 |
|
|
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128 |
|
|
- FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) #53248 |
|
|
- MAX_SEQ_LEN = 131072 # Max context |
|
|
- EPSILON = 1e-6 |
|
|
|
|
|
|
|
|
- So About PyTorch script . You can use Pytorch script for AI classification task . |
|
|
- Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks |
|
|
|
|
|
|
|
|
**Note:** The large model architectures replace specific layers: |
|
|
- `LayerNorm` β `RMSNorm` |
|
|
- `FFN` β `SwiGLU` |
|
|
|
|
|
--- |
|
|
### JiRack RAG System |
|
|
- It is microservice architecture with API Gateway and Service Discovery |
|
|
- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt |
|
|
- video https://www.youtube.com/watch?v=vHClQu76kMc |
|
|
- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/ |
|
|
|
|
|
--- |
|
|
|
|
|
# install tokenizer before run |
|
|
--- |
|
|
- mkdir -p tokenizer |
|
|
- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json |
|
|
- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json |
|
|
- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt |
|
|
- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json |
|
|
|
|
|
|
|
|
Welcome to ask to design your corp model over 33B or 70B or more parameters |
|
|
|
|
|
CMS Manhattan |
|
|
Copyright Β© 2002β2026 |