JiRack_GPT3 is not Open AI model . It is class GPT-3 model
Model Architecture Overview
Architectures Included
I have added my empty models based on the following architectures:
- GPT-3 Standard
- Llama 3
- Mistral
For smaller models modeled after GPT-2, I utilize LayerNorm and FFN layers. For larger models, these layers are replaced with RMSNorm and SwiGLU, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B).
Tokenizer Choices
- For English models: GPT-2 Hugging Face tokenizer
- For multilingual models: BERT tokenizer from the Hugging Face library
Training and Tuning
The Transformer block is not frozen, providing greater flexibility and power when tuning models from scratch.
Model Architecture Details
GPT-2 Architecture (Classic, Transformer-like)
CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
βββ MultiHeadAttention
βββ LayerNorm
βββ LayerNorm
βββ FFN
βββ Linear
βββ Activation: GELU
βββ Linear
LayerNorm
Linear
GPT-3 Architecture (Similar to Llama 3 & Mistral)
CustomEmbedding
# Positional Embedding removed, RoPE integrated in Attention
[TransformerBlock]
βββ MultiHeadAttention
βββ SwiGLUFeedForward
βββ Linear (Gate Layer)
βββ Linear (Up Layer)
βββ Linear (Projection/Down Layer)
βββ RMSNorm
RMSNorm
Linear
FrozenSignatureLayer
My LLMs
========================================================
Model Configuration (1B-class model)
========================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 2048
- NUM_HEADS = 32
- NUM_LAYERS = 16
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Non-standard FFN (4D)
- HEAD_DIM = MODEL_DIM // NUM_HEADS #64
- EPSILON = 1e-6
============================================
Model Configuration (31B-class model)
============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
=============================================
Model Configuration (8B-class model)
=============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096 # Increased for 8.5B-class (Standard, High-Efficiency)
- NUM_HEADS = 32
- NUM_LAYERS = 40 # Increased to 40 (same as Llama 13B)
- MAX_SEQ_LEN = 2048
RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) # 10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
==============================================
Model Configuration (10B-class model)
=================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096
- NUM_HEADS = 32
- NUM_LAYERS = 48 # Increased depth
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) #10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
=====================================================================================
Model Configuration (33B-class model) that is available by request
===========================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
====================================================================================
70B-Class Model Configuration (LLaMA-70B style) that available by request
====================================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Hidden size (d_model)
- NUM_HEADS = 64 # Q Heads
- NUM_KV_HEADS = 8 # KV Heads (GQA ratio = 8)
- NUM_LAYERS = 80 # 80 layers
- MAX_SEQ_LEN = 8192 # Max context (RoPE)
FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672)
Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B)
Using the standard LLaMA-70B FFN for accuracy
- FFN_HIDDEN_DIM = 28672
- HEAD_DIM = MODEL_DIM // NUM_HEADS
- EPSILON = 1e-6
JiRack Super Brain
It was Designed military design and Discover worlds and learn space and science goals
====================================================================================
#140B Configuration (real numbers) that is available by request, JiRack Super Brain
====================================================================================
VOCAB_SIZE = 32000
MODEL_DIM = 12288 # d_model
NUM_HEADS = 96 # Query heads
NUM_KV_HEADS = 12 # GQA: 8Γ groups
NUM_LAYERS = 80
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) #53248
MAX_SEQ_LEN = 131072 # Max context
EPSILON = 1e-6
So About PyTorch script . You can use Pytorch script for AI classification task .
Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks
Note: The large model architectures replace specific layers:
LayerNormβRMSNormFFNβSwiGLU
JiRack RAG System
- It is microservice architecture with API Gateway and Service Discovery
- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt
- video https://www.youtube.com/watch?v=vHClQu76kMc
- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/
install tokenizer before run
- mkdir -p tokenizer
- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json
Welcome to ask to design your corp model over 33B or 70B or more parameters
CMS Manhattan
Copyright Β© 2002β2026