CMSManhattan
/

JiRack_GPT5_236b

Model card Files Files and versions

kgrabko commited on Dec 22, 2025

Commit

67ccb5a

·

verified ·

1 Parent(s): deb606a

Upload BRE_memory_routing.md

Files changed (1) hide show

BRE_memory_routing.md +11 -0

BRE_memory_routing.md ADDED Viewed

	@@ -0,0 +1,11 @@

+# Buffered Routing Embedding (BRE) Algorithm
+**Inventor:** Konstantin Vladimirovich Grabko
+### Problem Statement
+Ultra-scale models (140B+) suffer from "Memory Wall" bottlenecks where the GPU waits for embedding weights to be fetched from HBM.
+### The BRE Solution
+BRE implements a predictive pre-fetching ring buffer.
+1. **Token Prediction Window:** A lightweight heuristic monitors the last $N$ tokens to predict high-probability future embeddings.
+2. **HBM Routing:** Predicted weights are moved from standard HBM to a specialized "High-Speed Buffer" partition (L3 Cache/Shared Memory) *before* the attention computation begins.
+3. **Synchronous Paging:** BRE uses Peer-to-Peer (P2P) DMA transfers across the ROCm/Infinity Fabric to ensure that weights for the next 4 layers are already local to the current GPU.