rvLLM H100 artifacts (2026-04-04)

This bundle contains the final H100 artifacts pulled from the Vast instance after the April 4, 2026 tuning pass.

Environment

Batch-1 default decode now uses the reusable batched scratch path.
Default batched GEMM policy is hybrid:
- QKV / O-proj / down-proj: cuBLAS or cublasLt
- GateUp + SiLU: CUTLASS
Persistent v3 remains an experimental path and is included here only as a correctness/perf artifact.

Current rvLLM on H100:

Published comparison point for vLLM 0.19.0 from the same April 4 campaign:

Earlier clean batched logs preserved in this bundle:

bench_32.txt, bench_64.txt, bench_128.txt: clean batched direct-engine benchmark outputs
baseline_final_check.txt: short baseline batch-1 check before persistent-v3 correctness fix work
v3_final_check.txt: log showing real persistent-v3 launches on decode
v3_harness_after_fix.txt: standalone persistent-v3 harness success log
v3_memcheck_after_fix.txt: compute-sanitizer log with ERROR SUMMARY: 0 errors
flash_attention_3_v3.ptx: SM90 FA3 v3 PTX artifact
persistent_layer_v3.ptx: SM90 persistent-v3 PTX artifact
persistent_layer_v3.cubin: compiled persistent-v3 cubin from the H100 instance

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support