CMSManhattan
/

JiRack_GPT5_7b

Model card Files Files and versions

xet

Community

kgrabko commited on Dec 22, 2025

Commit

ef962ee

verified ·

1 Parent(s): 2b35bcf

Update claims_dense.md

Browse files

Files changed (1) hide show

claims_dense.md +45 -45

claims_dense.md CHANGED Viewed

@@ -1,46 +1,46 @@
-# Patent Claims: JiRack Dense Transformer Architecture (V.1.2)
-**Inventor:** Konstantin Vladimirovich Grabko
-**Entity:** CMS Manhattan
-**Priority Date:** December 21, 2025
-**Technology Class:** Large Language Model (LLM) Optimization / Neural Compute Kernels
-## I. PRIMARY ARCHITECTURAL CLAIMS
-### Claim 1: SwiGLU-Attention (SWA) Fused Computational Block
-A method for executing a transformer layer in a deep neural network, characterized by the parallelization and unified dispatch of a Multi-Head Attention (MHA) mechanism and a Gated Linear Unit (SwiGLU) Feed-Forward Network (FFN), comprising:
-- An integrated input bus that loads hidden states into local GPU shared memory (SRAM) for simultaneous projection by both the attention and FFN sub-layers.
-- A fused residual accumulation path where $x_{out} = x + \text{Attn}(x) + \text{FFN}(x)$ is computed within a single kernel boundary to minimize Global Memory (VRAM) round-trips.
-### Claim 2: Buffered Routing Embedding (BRE) System
-A hardware-level memory management system for High Bandwidth Memory (HBM) optimized for large-scale embedding tables, comprising:
-- A look-ahead predictive buffer that identifies high-probability token sequences.
-- A dynamic routing mechanism that pages relevant embedding vectors from standard VRAM into a specialized high-speed "High-Bandwidth Partition" (L3 Cache or Shared Memory) prior to the activation of the first transformer block.
-## II. SCALING & ATTENTION CLAIMS
-### Claim 3: Dynamic GQA-Dense Scaling Logic
-An attention mechanism for models exceeding 7 billion parameters, utilizing a specific Grouped-Query Attention (GQA) ratio (specifically 5:1 for 13B configurations and 8:1 for 70B+ configurations) implemented in a dense, non-quantized format to maintain high-precision gradient flow while reducing KV-cache memory footprint.
-### Claim 4: Sliding Window Memory Efficiency (SWME)
-A method as described in Claim 3, further comprising a specialized WINDOW_SIZE cache logic that restricts the KV-cache to a fixed-length rolling buffer (e.g., 1024 tokens) during the inference phase, ensuring constant-time memory complexity for the Attention sub-layer.
-## III. HARDWARE OPTIMIZATION CLAIMS
-### Claim 5: ROCm-Specific Matrix Core Optimization
-A computational kernel optimized for AMD ROCm/HIP environments that utilizes specialized primitives to map the fused SWA block (Claim 1) directly onto hardware matrix cores, specifically targeting the reduction of instruction-level overhead in the RoPE (Rotary Positional Embedding) application.
-### Claim 6: Gradient Stability in Ultra-Deep Dense Networks
-An architectural arrangement of $N$ layers ($N \ge 40$) where the specific mathematical interaction between the RMSNorm layers and the fused SWA blocks enables training without auxiliary skip-connections or intermediate normalization, preserving 16-bit floating-point (BF16) precision throughout the stack.
-## IV. INTELLECTUAL PROPERTY PROTECTIONS
-### Claim 7: Digital Authorship Proof
-A non-functional, immutable data buffer registered within the model's weight distribution (proof_of_authorship) containing the cryptographically or string-encoded signature of the inventor, used to verify the origin of the architecture in derivative or fine-tuned versions of the weights.
----
-**Legal Notice**
 Any implementation of the logic described in Claims 1 through 7 without a valid license from Konstantin Vladimirovich Grabko constitutes an infringement of proprietary technology. Filing for patent rights based on the disclosed SWA or BRE logic by third parties is strictly prohibited under the terms of the JiRack Commercial License V.1.2.

+# Patent Claims: JiRack Dense Transformer Architecture (V.1.2) (7B Edition)
+**Inventor:** Konstantin Vladimirovich Grabko
+**Entity:** CMS Manhattan
+**Priority Date:** December 22, 2025
+**Technology Class:** Large Language Model (LLM) Optimization / Neural Compute Kernels
+## I. PRIMARY ARCHITECTURAL CLAIMS
+### Claim 1: SwiGLU-Attention (SWA) Fused Computational Block
+A method for executing a transformer layer in a deep neural network, characterized by the parallelization and unified dispatch of a Multi-Head Attention (MHA) mechanism and a Gated Linear Unit (SwiGLU) Feed-Forward Network (FFN), comprising:
+- An integrated input bus that loads hidden states into local GPU shared memory (SRAM) for simultaneous projection by both the attention and FFN sub-layers.
+- A fused residual accumulation path where $x_{out} = x + \text{Attn}(x) + \text{FFN}(x)$ is computed within a single kernel boundary to minimize Global Memory (VRAM) round-trips.
+### Claim 2: Buffered Routing Embedding (BRE) System
+A hardware-level memory management system for High Bandwidth Memory (HBM) optimized for large-scale embedding tables, comprising:
+- A look-ahead predictive buffer that identifies high-probability token sequences.
+- A dynamic routing mechanism that pages relevant embedding vectors from standard VRAM into a specialized high-speed "High-Bandwidth Partition" (L3 Cache or Shared Memory) prior to the activation of the first transformer block.
+## II. SCALING & ATTENTION CLAIMS
+### Claim 3: Dynamic GQA-Dense Scaling Logic
+An attention mechanism for models exceeding 7 billion parameters, utilizing a specific Grouped-Query Attention (GQA) ratio (specifically 5:1 for 13B configurations and 8:1 for 70B+ configurations) implemented in a dense, non-quantized format to maintain high-precision gradient flow while reducing KV-cache memory footprint.
+### Claim 4: Sliding Window Memory Efficiency (SWME)
+A method as described in Claim 3, further comprising a specialized WINDOW_SIZE cache logic that restricts the KV-cache to a fixed-length rolling buffer (e.g., 1024 tokens) during the inference phase, ensuring constant-time memory complexity for the Attention sub-layer.
+## III. HARDWARE OPTIMIZATION CLAIMS
+### Claim 5: ROCm-Specific Matrix Core Optimization
+A computational kernel optimized for AMD ROCm/HIP environments that utilizes specialized primitives to map the fused SWA block (Claim 1) directly onto hardware matrix cores, specifically targeting the reduction of instruction-level overhead in the RoPE (Rotary Positional Embedding) application.
+### Claim 6: Gradient Stability in Ultra-Deep Dense Networks
+An architectural arrangement of $N$ layers ($N \ge 40$) where the specific mathematical interaction between the RMSNorm layers and the fused SWA blocks enables training without auxiliary skip-connections or intermediate normalization, preserving 16-bit floating-point (BF16) precision throughout the stack.
+## IV. INTELLECTUAL PROPERTY PROTECTIONS
+### Claim 7: Digital Authorship Proof
+A non-functional, immutable data buffer registered within the model's weight distribution (proof_of_authorship) containing the cryptographically or string-encoded signature of the inventor, used to verify the origin of the architecture in derivative or fine-tuned versions of the weights.
+---
+**Legal Notice**
 Any implementation of the logic described in Claims 1 through 7 without a valid license from Konstantin Vladimirovich Grabko constitutes an infringement of proprietary technology. Filing for patent rights based on the disclosed SWA or BRE logic by third parties is strictly prohibited under the terms of the JiRack Commercial License V.1.2.