JiRack Dense: Ultra-Scale Transformer Architecture (140B - 405B+)

JiRack GPT 5 class

Author: Konstantin Vladimirovich Grabko
Organization: CMS Manhattan
Status: Patent Pending / Proprietary Technology
Version: 1.2 (Dense High-Precision Edition)

JiRack GPT 5 class

🚀 Overview

JiRack Dense is a high-performance transformer architecture designed to bridge the gap between 100B and 500B+ parameter models. Unlike traditional architectures that suffer from memory bottlenecks, JiRack utilizes proprietary SWA Fusion and BRE Routing to maximize throughput on AMD ROCm (Instinct MI300/400) and NVIDIA Hopper/Blackwell hardware.

This repository contains the core logic for the Dense (Non-Ternary) versions of JiRack, optimized for high-fidelity reasoning and stable training on massive datasets like The Pile.

🛠 Key Innovations

1. SwiGLU-Attention (SWA) Fusion

A unified compute kernel that merges the Feed-Forward Network (FFN) and Multi-Head Attention (MHA) passes.

Impact: 30% reduction in VRAM I/O and faster training steps.

2. Buffered Routing Embedding (BRE)

A predictive HBM management system that pre-fetches embedding weights into high-speed buffers.

Impact: Eliminates GPU "starvation" during inference and allows for context windows up to 128k.

3. Frontier Scaling

Optimized configurations for extreme scales:

140B: The efficiency leader for enterprise clusters.
236B: Balanced frontier performance.
405B+: SOTA-level reasoning capabilities.

📂 Repository Structure

/models - Architecture definitions (JiRackPyTorch_GPT5_class_Xb.py).
/docs - Patent claims, technical specifications, and BRE algorithms.
load_small_pile_GPT5_1b.py - Standard training script with DeepSpeed support.
LICENSE - Commercial proprietary license terms.

⚖️ Licensing & Legal

PROPRIETARY TECHNOLOGY - PATENT PENDING

This software is licensed under the CMS Manhattan JiRack V.1.2 License.

Commercial Use: Requires a royalty-bearing agreement (5% Net Revenue).
Restrictions: No reverse engineering of SWA kernels; no Knowledge Distillation for non-JiRack models.
Attribution: Commercial products must state: "Powered by CMS Manhattan JiRack Technology by Konstantin Vladimirovich Grabko."

Refer to license_dense.md for the full legal text.

📈 Performance Targets

Model Scale	Target Hardware	Precision	Optimized Interconnect
140B	8x A100/H100	BF16	NVLink / Infinity Fabric
236B	16x H100	BF16	800Gbps InfiniBand
405B+	32x H100/H200	BF16	Ultra-Ethernet / RoCE v2

📧 Contact

For licensing inquiries, enterprise deployment, or technical partnership: Konstantin Vladimirovich Grabko Email: grabko@cmsmanhattan.com
Phone: +1 (516) 777-0945

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support