JiRack_GPT3 is not Open AI model . It is class GPT-3 model

Model Architecture Overview

Architectures Included

I have added my empty models based on the following architectures:

GPT-3 Standard
Llama 3
Mistral

For smaller models modeled after GPT-2, I utilize LayerNorm and FFN layers. For larger models, these layers are replaced with RMSNorm and SwiGLU, enabling a smoother transition to architectures with larger parameter sizes (8B, 33B, 70B, and 120B).

Tokenizer Choices

For English models: GPT-2 Hugging Face tokenizer
For multilingual models: BERT tokenizer from the Hugging Face library

Training and Tuning

The Transformer block is not frozen, providing greater flexibility and power when tuning models from scratch.

Model Architecture Details

GPT-2 Architecture (Classic, Transformer-like)

CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
    ├── MultiHeadAttention
    ├── LayerNorm
    ├── LayerNorm
    ├── FFN
          ├── Linear
          ├── Activation: GELU
          └── Linear
LayerNorm
Linear

GPT-3 Architecture (Similar to Llama 3 & Mistral)

CustomEmbedding
# Positional Embedding removed, RoPE integrated in Attention
[TransformerBlock]
    ├── MultiHeadAttention
    ├── SwiGLUFeedForward
          ├── Linear (Gate Layer)
          ├── Linear (Up Layer)
          └── Linear (Projection/Down Layer)
    └── RMSNorm
RMSNorm
Linear
FrozenSignatureLayer

My LLMs

========================================================

Model Configuration (1B-class model)

========================================================

VOCAB_SIZE = 50257
MODEL_DIM = 2048
NUM_HEADS = 32
NUM_LAYERS = 16
MAX_SEQ_LEN = 2048
#RoPE
FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Non-standard FFN (4D)
HEAD_DIM = MODEL_DIM // NUM_HEADS #64
EPSILON = 1e-6

============================================

Model Configuration (31B-class model)

============================================

VOCAB_SIZE = 50257
MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
NUM_HEADS = 64
NUM_LAYERS = 32
MAX_SEQ_LEN = 8192 # Large context length
RoPE
FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
EPSILON = 1e-6

=============================================

Model Configuration (8B-class model)

=============================================

VOCAB_SIZE = 50257
MODEL_DIM = 4096 # Increased for 8.5B-class (Standard, High-Efficiency)
NUM_HEADS = 32
NUM_LAYERS = 40 # Increased to 40 (same as Llama 13B)
MAX_SEQ_LEN = 2048
RoPE
FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) # 10922 (Llama standard)
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
EPSILON = 1e-6

==============================================

Model Configuration (10B-class model)

=================================================

VOCAB_SIZE = 50257
MODEL_DIM = 4096
NUM_HEADS = 32
NUM_LAYERS = 48 # Increased depth
MAX_SEQ_LEN = 2048
#RoPE
FFN_HIDDEN_DIM = int(MODEL_DIM * 8 / 3) #10922 (Llama standard)
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
EPSILON = 1e-6

=====================================================================================

Model Configuration (33B-class model) that is available by request

===========================================================================

VOCAB_SIZE = 50257
MODEL_DIM = 8192 # Large dimension (like Llama 2 70B)
NUM_HEADS = 64
NUM_LAYERS = 32
MAX_SEQ_LEN = 8192 # Large context length
RoPE
FFN_HIDDEN_DIM = int(MODEL_DIM * 4) # Custom FFN (4D) - 32768
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
EPSILON = 1e-6

====================================================================================

70B-Class Model Configuration (LLaMA-70B style) that available by request

====================================================================================

VOCAB_SIZE = 50257
MODEL_DIM = 8192 # Hidden size (d_model)
NUM_HEADS = 64 # Q Heads
NUM_KV_HEADS = 8 # KV Heads (GQA ratio = 8)
NUM_LAYERS = 80 # 80 layers
MAX_SEQ_LEN = 8192 # Max context (RoPE)
FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672)
Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B)
Using the standard LLaMA-70B FFN for accuracy
FFN_HIDDEN_DIM = 28672
HEAD_DIM = MODEL_DIM // NUM_HEADS
EPSILON = 1e-6

JiRack Super Brain

It was Designed military design and Discover worlds and learn space and science goals

====================================================================================

#140B Configuration (real numbers) that is available by request, JiRack Super Brain

====================================================================================

VOCAB_SIZE = 32000
MODEL_DIM = 12288 # d_model
NUM_HEADS = 96 # Query heads
NUM_KV_HEADS = 12 # GQA: 8× groups
NUM_LAYERS = 80
HEAD_DIM = MODEL_DIM // NUM_HEADS # 128
FFN_HIDDEN_DIM = int(4 * MODEL_DIM * 1.3) #53248
MAX_SEQ_LEN = 131072 # Max context
EPSILON = 1e-6
So About PyTorch script . You can use Pytorch script for AI classification task .
Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks

Note: The large model architectures replace specific layers:

LayerNorm → RMSNorm
FFN → SwiGLU

JiRack RAG System

It is microservice architecture with API Gateway and Service Discovery
Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt
video https://www.youtube.com/watch?v=vHClQu76kMc
RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/

install tokenizer before run

mkdir -p tokenizer
wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json

Welcome to ask to design your corp model over 33B or 70B or more parameters

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model Architecture Overview

Architectures Included

Tokenizer Choices

Training and Tuning

Model Architecture Details

GPT-2 Architecture (Classic, Transformer-like)

GPT-3 Architecture (Similar to Llama 3 & Mistral)

========================================================

Model Configuration (1B-class model)

========================================================

============================================

Model Configuration (31B-class model)

============================================

RoPE

=============================================

Model Configuration (8B-class model)

=============================================

RoPE

==============================================

Model Configuration (10B-class model)

=================================================

=====================================================================================

Model Configuration (33B-class model) that is available by request

===========================================================================

RoPE

====================================================================================

70B-Class Model Configuration (LLaMA-70B style) that available by request

====================================================================================

FFN LLaMA-70B Hidden Dim: 28672 (32768 * 2/3 + 32768 * 1/3 * 2/3 * 0.95, roughly 28672)

Exact value for LLaMA: 2 * (D * 2/3) + D * 2/3 * (1 - 2/3) * ~1.2 (for 70B)

Using the standard LLaMA-70B FFN for accuracy

JiRack Super Brain

It was Designed military design and Discover worlds and learn space and science goals

====================================================================================

====================================================================================

JiRack RAG System

install tokenizer before run