JiRack_GPT3_empty / README.md

Update README.md

78de0aa verified about 2 months ago

6.44 kB

	---
	license: gpl-3.0
	---

	JiRack_GPT3 is not Open AI model . It is class GPT-3 model

	# Model Architecture Overview

	## Architectures Included

	I have added my empty models based on the following architectures:

	- GPT-3 Standard
	- Llama 3
	- Mistral

	For smaller models modeled after GPT-2, I utilize `LayerNorm` and `FFN` layers. For larger models, these layers are replaced with `RMSNorm` and `SwiGLU`, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B).

	---

	## Tokenizer Choices

	- For English models: GPT-2 Hugging Face tokenizer
	- For multilingual models: BERT tokenizer from the Hugging Face library

	---

	## Training and Tuning

	The Transformer block is not frozen, providing greater flexibility and power when tuning models from scratch.

	---

	## Model Architecture Details

	### GPT-2 Architecture (Classic, Transformer-like)

	```
	CustomEmbedding
	FrozenSignatureLayer
	LearnedPositionalEmbedding
	[TransformerBlock]
	├── MultiHeadAttention
	├── LayerNorm
	├── LayerNorm
	├── FFN
	├── Linear
	├── Activation: GELU
	└── Linear
	LayerNorm
	Linear
	```

	---

	### GPT-3 Architecture (Similar to Llama 3 & Mistral)

	```
	CustomEmbedding
	# Positional Embedding removed, RoPE integrated in Attention
	[TransformerBlock]
	├── MultiHeadAttention
	├── SwiGLUFeedForward
	├── Linear (Gate Layer)
	├── Linear (Up Layer)
	└── Linear (Projection/Down Layer)
	└── RMSNorm
	RMSNorm
	Linear
	FrozenSignatureLayer
	```

	My LLMs

	# ========================================================
	# Model Configuration (1B-class model)
	# ========================================================
	- VOCAB_SIZE = 50257
	- MODEL_DIM = 2048
	- NUM_HEADS = 32
	- NUM_LAYERS = 16
	- MAX_SEQ_LEN = 2048
	- #RoPE
	- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Non-standard FFN (4D)
	- HEAD_DIM = MODEL_DIM // NUM_HEADS #64
	- EPSILON = 1e-6
	---

	# ============================================
	# Model Configuration (31B-class model)
	# ============================================
	- VOCAB_SIZE = 50257
	- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
	- NUM_HEADS = 64
	- NUM_LAYERS = 32
	- MAX_SEQ_LEN = 8192 # Large context length
	- # RoPE
	- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
	- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
	- EPSILON = 1e-6

	---

	# =============================================
	# Model Configuration (8B-class model)
	# =============================================
	- VOCAB_SIZE = 50257
	- MODEL_DIM = 4096 # Increased for 8.5B-class (Standard, High-Efficiency)
	- NUM_HEADS = 32
	- NUM_LAYERS = 40 # Increased to 40 (same as Llama 13B)
	- MAX_SEQ_LEN = 2048
	- # RoPE
	- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) # 10922 (Llama standard)
	- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
	- EPSILON = 1e-6

	---

	# ==============================================
	# Model Configuration (10B-class model)
	# =================================================
	- VOCAB_SIZE = 50257
	- MODEL_DIM = 4096
	- NUM_HEADS = 32
	- NUM_LAYERS = 48 # Increased depth
	- MAX_SEQ_LEN = 2048
	- #RoPE
	- FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) #10922 (Llama standard)
	- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
	- EPSILON = 1e-6

	---

	# =====================================================================================
	# Model Configuration (33B-class model) that is available by request
	# ===========================================================================
	- VOCAB_SIZE = 50257
	- MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
	- NUM_HEADS = 64
	- NUM_LAYERS = 32
	- MAX_SEQ_LEN = 8192 # Large context length
	- # RoPE
	- FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
	- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
	- EPSILON = 1e-6

	---

	# ====================================================================================
	# 70B-Class Model Configuration (LLaMA-70B style) that available by request
	# ====================================================================================
	- VOCAB_SIZE = 50257
	- MODEL_DIM = 8192 # Hidden size (d_model)
	- NUM_HEADS = 64 # Q Heads
	- NUM_KV_HEADS = 8 # KV Heads (GQA ratio = 8)
	- NUM_LAYERS = 80 # 80 layers
	- MAX_SEQ_LEN = 8192 # Max context (RoPE)
	- # FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672)
	- # Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B)
	- # Using the standard LLaMA-70B FFN for accuracy
	- FFN_HIDDEN_DIM = 28672
	- HEAD_DIM = MODEL_DIM // NUM_HEADS
	- EPSILON = 1e-6

	---
	#
	# JiRack Super Brain
	# It was Designed military design and Discover worlds and learn space and science goals
	#
	# ====================================================================================
	#140B Configuration (real numbers) that is available by request, JiRack Super Brain
	# ====================================================================================
	- VOCAB_SIZE = 32000
	- MODEL_DIM = 12288 # d_model
	- NUM_HEADS = 96 # Query heads
	- NUM_KV_HEADS = 12 # GQA: 8× groups
	- NUM_LAYERS = 80
	- HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
	- FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) #53248
	- MAX_SEQ_LEN = 131072 # Max context
	- EPSILON = 1e-6


	- So About PyTorch script . You can use Pytorch script for AI classification task .
	- Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks


	Note: The large model architectures replace specific layers:
	- `LayerNorm` → `RMSNorm`
	- `FFN` → `SwiGLU`

	---
	### JiRack RAG System
	- It is microservice architecture with API Gateway and Service Discovery
	- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt
	- video https://www.youtube.com/watch?v=vHClQu76kMc
	- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/

	---

	# install tokenizer before run
	---
	- mkdir -p tokenizer
	- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
	- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
	- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
	- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json


	Welcome to ask to design your corp model over 33B or 70B or more parameters

	CMS Manhattan
	Copyright © 2002–2026