YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

rvLLM H100 artifacts (2026-04-04)

This bundle contains the final H100 artifacts pulled from the Vast instance after the April 4, 2026 tuning pass.

Environment

  • GPU: NVIDIA H100 80GB HBM3
  • Model: Qwen/Qwen2.5-7B
  • Output length: 128 tokens unless noted
  • Repo head pushed to GitHub: acbb62578
  • Public source repo: https://github.com/m0at/rvllm

Final shipped settings

  • Batch-1 default decode now uses the reusable batched scratch path.
  • Default batched GEMM policy is hybrid:
    • QKV / O-proj / down-proj: cuBLAS or cublasLt
    • GateUp + SiLU: CUTLASS
  • Persistent v3 remains an experimental path and is included here only as a correctness/perf artifact.

Canonical direct-engine results from this pass

Current rvLLM on H100:

  • N=1: 133.1 tok/s
  • N=64: 8038.0 tok/s
  • N=128: 13110.1 tok/s

Published comparison point for vLLM 0.19.0 from the same April 4 campaign:

  • N=1: 165.5 tok/s
  • N=64: 7972.1 tok/s
  • N=128: 13903.5 tok/s

Earlier clean batched logs preserved in this bundle:

  • N=32: 4407.5 tok/s (bench_32.txt)
  • N=64: 7964.0 tok/s (bench_64.txt)
  • N=128: 13148.3 tok/s (bench_128.txt)

Included files

  • bench_32.txt, bench_64.txt, bench_128.txt: clean batched direct-engine benchmark outputs
  • baseline_final_check.txt: short baseline batch-1 check before persistent-v3 correctness fix work
  • v3_final_check.txt: log showing real persistent-v3 launches on decode
  • v3_harness_after_fix.txt: standalone persistent-v3 harness success log
  • v3_memcheck_after_fix.txt: compute-sanitizer log with ERROR SUMMARY: 0 errors
  • flash_attention_3_v3.ptx: SM90 FA3 v3 PTX artifact
  • persistent_layer_v3.ptx: SM90 persistent-v3 PTX artifact
  • persistent_layer_v3.cubin: compiled persistent-v3 cubin from the H100 instance
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support