JiRack_GPT3_empty / README.md
kgrabko's picture
Update README.md
78de0aa verified
---
license: gpl-3.0
---
JiRack_GPT3 is not Open AI model . It is class GPT-3 model
# Model Architecture Overview
## Architectures Included
I have added my empty models based on the following architectures:
- **GPT-3 Standard**
- **Llama 3**
- **Mistral**
For smaller models modeled after **GPT-2**, I utilize `LayerNorm` and `FFN` layers. For larger models, these layers are replaced with `RMSNorm` and `SwiGLU`, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B).
---
## Tokenizer Choices
- For English models: **GPT-2 Hugging Face tokenizer**
- For multilingual models: **BERT tokenizer** from the Hugging Face library
---
## Training and Tuning
The **Transformer block is not frozen**, providing greater flexibility and power when tuning models from scratch.
---
## Model Architecture Details
### GPT-2 Architecture (Classic, Transformer-like)
```
CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
β”œβ”€β”€ MultiHeadAttention
β”œβ”€β”€ LayerNorm
β”œβ”€β”€ LayerNorm
β”œβ”€β”€ FFN
β”œβ”€β”€ Linear
β”œβ”€β”€ Activation: GELU
└── Linear
LayerNorm
Linear
```
---
### GPT-3 Architecture (Similar to Llama 3 & Mistral)
```
CustomEmbedding
# Positional Embedding removed, RoPE integrated in Attention
[TransformerBlock]
β”œβ”€β”€ MultiHeadAttention
β”œβ”€β”€ SwiGLUFeedForward
β”œβ”€β”€ Linear (Gate Layer)
β”œβ”€β”€ Linear (Up Layer)
└── Linear (Projection/Down Layer)
└── RMSNorm
RMSNorm
Linear
FrozenSignatureLayer
```
My LLMs
# ========================================================
# Model Configuration (1B-class model)
# ========================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 2048
- NUM_HEADS = 32
- NUM_LAYERS = 16
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Non-standard FFN (4D)
- HEAD_DIM = MODEL_DIM // NUM_HEADS #64
- EPSILON = 1e-6
---
# ============================================
# Model Configuration (31B-class model)
# ============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# =============================================
# Model Configuration (8B-class model)
# =============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096 # Increased for 8.5B-class (Standard, High-Efficiency)
- NUM_HEADS = 32
- NUM_LAYERS = 40 # Increased to 40 (same as Llama 13B)
- MAX_SEQ_LEN = 2048
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) # 10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# ==============================================
# Model Configuration (10B-class model)
# =================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096
- NUM_HEADS = 32
- NUM_LAYERS = 48 # Increased depth
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) #10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# =====================================================================================
# Model Configuration (33B-class model) that is available by request
# ===========================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6
---
# ====================================================================================
# 70B-Class Model Configuration (LLaMA-70B style) that available by request
# ====================================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Hidden size (d_model)
- NUM_HEADS = 64 # Q Heads
- NUM_KV_HEADS = 8 # KV Heads (GQA ratio = 8)
- NUM_LAYERS = 80 # 80 layers
- MAX_SEQ_LEN = 8192 # Max context (RoPE)
- # FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672)
- # Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B)
- # Using the standard LLaMA-70B FFN for accuracy
- FFN_HIDDEN_DIM = 28672
- HEAD_DIM = MODEL_DIM // NUM_HEADS
- EPSILON = 1e-6
---
#
# JiRack Super Brain
# It was Designed military design and Discover worlds and learn space and science goals
#
# ====================================================================================
#140B Configuration (real numbers) that is available by request, JiRack Super Brain
# ====================================================================================
- VOCAB_SIZE = 32000
- MODEL_DIM = 12288 # d_model
- NUM_HEADS = 96 # Query heads
- NUM_KV_HEADS = 12 # GQA: 8Γ— groups
- NUM_LAYERS = 80
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) #53248
- MAX_SEQ_LEN = 131072 # Max context
- EPSILON = 1e-6
- So About PyTorch script . You can use Pytorch script for AI classification task .
- Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks
**Note:** The large model architectures replace specific layers:
- `LayerNorm` β†’ `RMSNorm`
- `FFN` β†’ `SwiGLU`
---
### JiRack RAG System
- It is microservice architecture with API Gateway and Service Discovery
- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt
- video https://www.youtube.com/watch?v=vHClQu76kMc
- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/
---
# install tokenizer before run
---
- mkdir -p tokenizer
- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json
Welcome to ask to design your corp model over 33B or 70B or more parameters
CMS Manhattan
Copyright Β© 2002–2026