JiRack Dense: Ultra-Scale Transformer Architecture (140B - 405B+)

JiRack GPT 5 class

Author: Konstantin Vladimirovich Grabko
Organization: CMS Manhattan
Status: Patent Pending / Proprietary Technology
Version: 1.2 (Dense High-Precision Edition)


JiRack GPT 5 class

πŸš€ Overview

JiRack Dense is a high-performance transformer architecture designed to bridge the gap between 100B and 500B+ parameter models. Unlike traditional architectures that suffer from memory bottlenecks, JiRack utilizes proprietary SWA Fusion and BRE Routing to maximize throughput on AMD ROCm (Instinct MI300/400) and NVIDIA Hopper/Blackwell hardware.

This repository contains the core logic for the Dense (Non-Ternary) versions of JiRack, optimized for high-fidelity reasoning and stable training on massive datasets like The Pile.


πŸ›  Key Innovations

1. SwiGLU-Attention (SWA) Fusion

A unified compute kernel that merges the Feed-Forward Network (FFN) and Multi-Head Attention (MHA) passes.

  • Impact: 30% reduction in VRAM I/O and faster training steps.

2. Buffered Routing Embedding (BRE)

A predictive HBM management system that pre-fetches embedding weights into high-speed buffers.

  • Impact: Eliminates GPU "starvation" during inference and allows for context windows up to 128k.

3. Frontier Scaling

Optimized configurations for extreme scales:

  • 140B: The efficiency leader for enterprise clusters.
  • 236B: Balanced frontier performance.
  • 405B+: SOTA-level reasoning capabilities.

πŸ“‚ Repository Structure

  • /models - Architecture definitions (JiRackPyTorch_GPT5_class_Xb.py).
  • /docs - Patent claims, technical specifications, and BRE algorithms.
  • load_small_pile_GPT5_1b.py - Standard training script with DeepSpeed support.
  • LICENSE - Commercial proprietary license terms.

βš–οΈ Licensing & Legal

PROPRIETARY TECHNOLOGY - PATENT PENDING

This software is licensed under the CMS Manhattan JiRack V.1.2 License.

  • Commercial Use: Requires a royalty-bearing agreement (5% Net Revenue).
  • Restrictions: No reverse engineering of SWA kernels; no Knowledge Distillation for non-JiRack models.
  • Attribution: Commercial products must state: "Powered by CMS Manhattan JiRack Technology by Konstantin Vladimirovich Grabko."

Refer to license_dense.md for the full legal text.


πŸ“ˆ Performance Targets

Model Scale Target Hardware Precision Optimized Interconnect
140B 8x A100/H100 BF16 NVLink / Infinity Fabric
236B 16x H100 BF16 800Gbps InfiniBand
405B+ 32x H100/H200 BF16 Ultra-Ethernet / RoCE v2

πŸ“§ Contact

For licensing inquiries, enterprise deployment, or technical partnership: Konstantin Vladimirovich Grabko Email: grabko@cmsmanhattan.com
Phone: +1 (516) 777-0945

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support