--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers tags: - inference - inference-optimization - sparse - sparse-inference - mixture-of-experts - moe - matrix-multiplication - gemm - cuda - rocm - cpu-inference - benchmark - verification - reproducibility - cryptographic-verification --- # ROLV Primitive© **A drop-in matmul operator that delivers up to 106× faster AI inference and up to 99% less energy on the same hardware — with bit-identical output.** [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19221455.svg)](https://doi.org/10.5281/zenodo.19221455) [![Live Demo](https://img.shields.io/badge/Live%20Demo-rolv.ai-orange)](https://rolv.ai) [![Validated](https://img.shields.io/badge/Validated-University%20of%20Miami-green)](https://rolv.ai/validation) --- ## Run the benchmark from any device — no install Go to **[rolv.ai](https://rolv.ai)**, sign in, pick a model from the dropdown, click run. Compute happens on our server via the public benchmark API (see [rolvai/benchmark](https://huggingface.co/spaces/rolvai/benchmark)). A SHA-256 signed receipt arrives in your inbox with per-case speedup, energy reduction, correctness check, and a run hash bound to your hardware fingerprint. Works on a laptop. Works on a Chromebook. Works on a phone. Takes about two minutes. The receipt is cryptographically tied to your run — it cannot be copied from another benchmark, and cannot be fabricated without actually running. ROLV does not send newsletters. One-time receipt email. That's it. --- ## Selected results on real HuggingFace weights **NVIDIA B200, BF16, TF32 on, 1,000 iterations:** | Model | Natural sparsity | vs cuBLAS | vs cuSPARSE | Energy reduction | |---|---|---|---|---| | Llama-4-Scout | 93.8% | 4.75× | 103× | 79% | | Mixtral-8×7B | 75.0% | 1.86× | 109× | 46% | | Qwen3-30B-A3B | 93.8% | 3.43× | 32× | 71% | | OLMoE-1B-7B (H200) | 87.5% | 2.49× | 43× | 60% | **Intel i7 laptop (4 cores, 68 GB RAM, MKL baseline):** | Model / Layer | Sparsity | vs MKL | |---|---|---| | Llama-3.2-1B down_proj | 99% | 106.65× | | Qwen2.5-7B gate_proj | 95% | 59.70× | | Mistral-7B q_proj | 95% | 21.45× | Full per-case data with SHA-256 hashes: **[rolv.ai/rolv_benchmarks.pdf](https://rolv.ai/rolv_benchmarks.pdf)** --- ## Why it works Modern AI weight matrices are mostly zero. In an MoE model like Mixtral or DeepSeek-V3, 75–97% of weights are architecturally inactive for any given token — guaranteed by the router, known before computation. Standard libraries compute them anyway. ROLV identifies the non-zero structure at load time and restricts computation to live elements only. Same BLAS/tensor-core primitive, same output — on a matrix proportional to the non-zero fraction. Results are placed in the correct positions of the full output tensor. Final output is bit-identical to the full dense operation. Inner mechanism is Patent Pending. --- ## Verification protocol Every benchmark case is produced with five independent verification layers: 1. **Real HuggingFace weights** — downloaded from public repositories, no synthetic matrices 2. **Vendor baseline** — Intel MKL on CPU, cuBLAS on GPU, cuSPARSE at high sparsity 3. **Four SHA-256 hashes per case** — input matrix, input vector, vendor output, ROLV output 4. **Perturbation test** — one weight altered by 10⁻³, output hash must change (rules out cached answers) 5. **Signed run hash** — SHA-256 over speedup, timestamp, and hardware fingerprint ATOL = 0.05 on column-normalised fp64. The correctness check and the speed measurement are the same execution — you cannot skip work to game the clock without failing correctness. **1,684 / 1,684 GPU PASS · 332 / 332 CPU PASS** Independently validated by the **University of Miami Frost Institute for Data Science and Computing** — bit-identical SHA-256 hashes across CPU, GPU, and TPU. No commercial relationship. [Validation letter](https://rolv.ai/validation). --- ## Hardware compatibility **Validated today:** NVIDIA (B200, H200, H100, A100, RTX series, T4, V100) · AMD (MI300X, MI250X, RX 7900) · Intel CPU (MKL, AVX-512) · AMD EPYC · ARM Neoverse · Apple Silicon (M1–M4 Pro) · Google TPU (v4, v5) **Framework support:** PyTorch · JAX · TensorFlow · ONNX Runtime · TensorRT · vLLM · HuggingFace Transformers **Works with:** BF16, FP16, FP32, INT8 checkpoints. No retraining. No re-quantisation. --- ## On-premise evaluation The public benchmark at rolv.ai proves the output. To run ROLV on your own hardware, against your own models, in your own environment, two NDA-gated tiers are available: **Secure Container** — Hardware-locked Docker container. RolvKey™-authenticated. Processor fingerprint binding at first run; will not execute on any other machine. Optional Intel SGX hardware encryption for regulated environments. Evaluation licence + NDA required. **Direct Hardware** — Single authenticated file for bare-metal servers and air-gapped environments where Docker is not permitted. Processor-bound binary with live heartbeat attestation. Evaluation licence + NDA required. Both tiers return cryptographically signed per-run results bound to your processor fingerprint. **Contact:** rolv@rolv.ai --- ## Citation ```bibtex @software{heggenhougen_rolv_2026, author = {Heggenhougen, Rolv E.}, title = {ROLV Primitive©: A Universal Compute Primitive for Sparse AI Inference}, year = {2026}, publisher = {Zenodo}, doi = {10.5281/zenodo.19221455}, url = {https://rolv.ai} } ``` --- ## Links - **Live benchmark:** [rolv.ai](https://rolv.ai) - **Paper:** [Zenodo 10.5281/zenodo.19221455](https://doi.org/10.5281/zenodo.19221455) - **Validation letter:** [rolv.ai/validation](https://rolv.ai/validation) - **Benchmark API Space:** [rolvai/benchmark](https://huggingface.co/spaces/rolvai/benchmark) Contact: **rolv@rolv.ai** — a real person reads every message. --- **ROLV LLC** · Fort Lauderdale, FL · Patent Pending · ROLV Primitive© · RSMT™ · ROLVswitch™ · RolvKey™