PERFORMANCE BENCHMARK PLAN: JiRack 236B

Document ID: CMS-JR-236B-BCH-2025
Target: 16x H100 GPU Cluster (2-Node HGX)

1. Stage I: Fabric & Communication (The "Stress Test")

Before loading weights, the inter-node fabric must be verified. JiRack's 14:1 GQA relies on ultra-fast All-Reduce operations during the attention heads' merge phase.

Test Type	Tool	Target Metric	Success Criteria
NCCL All-Reduce	`nccl-tests`	Bus Bandwidth	> 380 GB/s (Intra-node) / > 45 GB/s (Inter-node)
P2P Latency	`p2pBandwidthLatencyTest`	Latency (μs)	< 2.0 μs via NVLink
IB Write BW	`ib_write_bw`	Throughput	390+ Gbps per link (NDR 400)

2. Stage II: JiRack SWA Kernel Fusion (The "Compute Test")

Standard benchmarks (like HPL) are ineffective for JiRack. The SwiGLU-Attention (SWA) fusion logic must be tested on the specific 14,336-width dimension.

Benchmark Tool: trtllm-bench (TensorRT-LLM) or a custom Triton kernel profiler.
Target Configuration:
- Input: 1024 tokens (Prompt).
- Output: 128 tokens (Generation).
- Batch Size: 1, 8, 32, 64.

The "Grabko Metric":

The system must achieve at least 55% MFU (Model FLOPs Utilization).
If the vendor delivers <45%, BRE pre-fetching logic may be throttled due to PCIe bottlenecks.

3. Stage III: Throughput & Latency (The "User Experience")

This stage simulates real-world commercial usage of the JiRack API.

Scenario	Metric	Target for 236B
Interactive (BS=1)	Time to First Token (TTFT)	< 150ms
Interactive (BS=1)	Tokens per Second (TPS)	> 25 tokens/sec
High Load (BS=64)	Total Throughput	> 1,200 tokens/sec

4. Stability & Reliability (The "Burn-in")

Large clusters often fail during long reasoning chains.

Test: 24-hour continuous generation at 80% TDP (Thermal Design Power).
Success Criteria:
- Zero NCCL timeouts.
- Zero XID errors (GPU hardware faults).

MTBF Target:

Mean Time Between Failures (MTBF) for the 108-layer stack must exceed 720 hours on this specific 16-GPU setup.

5. Official Verification

The vendor must provide a logs export containing the proof_of_authorship string verification.

verify_authorship(model) -> "Author: Konstantin Vladimirovich Grabko (CMS Manhattan) 2025"

Acceptance of the hardware cluster is contingent upon meeting these 236B benchmarks.