---
license: gpl-3.0
---

JiRack_GPT3 is not Open AI model . It is class GPT-3 model

# Model Architecture Overview

## Architectures Included

I have added my empty models based on the following architectures:

- **GPT-3 Standard**
- **Llama 3**
- **Mistral**

For smaller models modeled after **GPT-2**, I utilize `LayerNorm` and `FFN` layers. For larger models, these layers are replaced with `RMSNorm` and `SwiGLU`, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B).

---

## Tokenizer Choices

- For English models: **GPT-2 Hugging Face tokenizer**
- For multilingual models: **BERT tokenizer** from the Hugging Face library

---

## Training and Tuning

The **Transformer block is not frozen**, providing greater flexibility and power when tuning models from scratch.

---

## Model Architecture Details

### GPT-2 Architecture (Classic, Transformer-like)

```
CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
    ├── MultiHeadAttention
    ├── LayerNorm
    ├── LayerNorm
    ├── FFN
          ├── Linear
          ├── Activation: GELU
          └── Linear
LayerNorm
Linear
```

---

### GPT-3 Architecture (Similar to Llama 3 & Mistral)

```
CustomEmbedding
# Positional Embedding removed, RoPE integrated in Attention
[TransformerBlock]
    ├── MultiHeadAttention
    ├── SwiGLUFeedForward
          ├── Linear (Gate Layer)
          ├── Linear (Up Layer)
          └── Linear (Projection/Down Layer)
    └── RMSNorm
RMSNorm
Linear
FrozenSignatureLayer
```

My LLMs 

# ========================================================
# Model Configuration (1B-class model)
# ========================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 2048
- NUM_HEADS = 32
- NUM_LAYERS = 16
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Non-standard FFN (4D)
- HEAD_DIM = MODEL_DIM // NUM_HEADS #64
- EPSILON = 1e-6
---

# ============================================
# Model Configuration (31B-class model)
# ============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6

---

# =============================================
# Model Configuration (8B-class model)
# =============================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096 # Increased for 8.5B-class (Standard, High-Efficiency)
- NUM_HEADS = 32
- NUM_LAYERS = 40 # Increased to 40 (same as Llama 13B)
- MAX_SEQ_LEN = 2048
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) # 10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6

---

# ==============================================
# Model Configuration (10B-class model)
# =================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 4096
- NUM_HEADS = 32
- NUM_LAYERS = 48 # Increased depth
- MAX_SEQ_LEN = 2048
- #RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) #10922 (Llama standard)
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6

---

# =====================================================================================
# Model Configuration (33B-class model) that is available by request
# ===========================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
- NUM_HEADS = 64
- NUM_LAYERS = 32
- MAX_SEQ_LEN = 8192 # Large context length
- # RoPE
- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- EPSILON = 1e-6

---

# ====================================================================================
# 70B-Class Model Configuration (LLaMA-70B style) that available by request
# ====================================================================================
- VOCAB_SIZE = 50257
- MODEL_DIM = 8192 # Hidden size (d_model)
- NUM_HEADS = 64 # Q Heads
- NUM_KV_HEADS = 8 # KV Heads (GQA ratio = 8)
- NUM_LAYERS = 80 # 80 layers
- MAX_SEQ_LEN = 8192 # Max context (RoPE)
- # FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672)
- # Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B)
- # Using the standard LLaMA-70B FFN for accuracy
- FFN_HIDDEN_DIM = 28672
- HEAD_DIM = MODEL_DIM // NUM_HEADS
- EPSILON = 1e-6

---
#
# JiRack Super Brain
# It was Designed military design and Discover worlds and learn space and science goals
#
# ====================================================================================
#140B Configuration (real numbers) that is available by request, JiRack Super Brain
# ====================================================================================
- VOCAB_SIZE = 32000
- MODEL_DIM = 12288 # d_model
- NUM_HEADS = 96 # Query heads
- NUM_KV_HEADS = 12 # GQA: 8× groups
- NUM_LAYERS = 80
- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
- FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) #53248
- MAX_SEQ_LEN = 131072 # Max context
- EPSILON = 1e-6


- So About PyTorch script . You can use Pytorch script for AI classification task . 
- Do not Jit for Chatbot task . Use just state dict PyTorch for  GPT  (Chatbot) tasks


**Note:** The large model architectures replace specific layers:
- `LayerNorm` → `RMSNorm`
- `FFN` → `SwiGLU`

---
### JiRack RAG System
- It is microservice architecture with API Gateway and Service Discovery 
- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt 
- video https://www.youtube.com/watch?v=vHClQu76kMc
- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/

---

# install tokenizer before run 
---
- mkdir -p tokenizer
- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json


Welcome to ask to design your corp model over 33B or 70B or more parameters

CMS Manhattan  
Copyright © 2002–2026