rolv-primitive / README.md
rolvai's picture
Update README.md
3d174b3 verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- inference
- inference-optimization
- sparse
- sparse-inference
- mixture-of-experts
- moe
- matrix-multiplication
- gemm
- cuda
- rocm
- cpu-inference
- benchmark
- verification
- reproducibility
- cryptographic-verification
---
# ROLV Primitive©
**A drop-in matmul operator that delivers up to 106× faster AI inference and up to 99% less energy on the same hardware — with bit-identical output.**
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19221455.svg)](https://doi.org/10.5281/zenodo.19221455)
[![Live Demo](https://img.shields.io/badge/Live%20Demo-rolv.ai-orange)](https://rolv.ai)
[![Validated](https://img.shields.io/badge/Validated-University%20of%20Miami-green)](https://rolv.ai/validation)
---
## Run the benchmark from any device — no install
Go to **[rolv.ai](https://rolv.ai)**, sign in, pick a model from the dropdown, click run. Compute happens on our server via the public benchmark API (see [rolvai/benchmark](https://huggingface.co/spaces/rolvai/benchmark)). A SHA-256 signed receipt arrives in your inbox with per-case speedup, energy reduction, correctness check, and a run hash bound to your hardware fingerprint.
Works on a laptop. Works on a Chromebook. Works on a phone. Takes about two minutes. The receipt is cryptographically tied to your run — it cannot be copied from another benchmark, and cannot be fabricated without actually running.
ROLV does not send newsletters. One-time receipt email. That's it.
---
## Selected results on real HuggingFace weights
**NVIDIA B200, BF16, TF32 on, 1,000 iterations:**
| Model | Natural sparsity | vs cuBLAS | vs cuSPARSE | Energy reduction |
|---|---|---|---|---|
| Llama-4-Scout | 93.8% | 4.75× | 103× | 79% |
| Mixtral-8×7B | 75.0% | 1.86× | 109× | 46% |
| Qwen3-30B-A3B | 93.8% | 3.43× | 32× | 71% |
| OLMoE-1B-7B (H200) | 87.5% | 2.49× | 43× | 60% |
**Intel i7 laptop (4 cores, 68 GB RAM, MKL baseline):**
| Model / Layer | Sparsity | vs MKL |
|---|---|---|
| Llama-3.2-1B down_proj | 99% | 106.65× |
| Qwen2.5-7B gate_proj | 95% | 59.70× |
| Mistral-7B q_proj | 95% | 21.45× |
Full per-case data with SHA-256 hashes: **[rolv.ai/rolv_benchmarks.pdf](https://rolv.ai/rolv_benchmarks.pdf)**
---
## Why it works
Modern AI weight matrices are mostly zero. In an MoE model like Mixtral or DeepSeek-V3, 75–97% of weights are architecturally inactive for any given token — guaranteed by the router, known before computation. Standard libraries compute them anyway. ROLV identifies the non-zero structure at load time and restricts computation to live elements only. Same BLAS/tensor-core primitive, same output — on a matrix proportional to the non-zero fraction. Results are placed in the correct positions of the full output tensor. Final output is bit-identical to the full dense operation.
Inner mechanism is Patent Pending.
---
## Verification protocol
Every benchmark case is produced with five independent verification layers:
1. **Real HuggingFace weights** — downloaded from public repositories, no synthetic matrices
2. **Vendor baseline** — Intel MKL on CPU, cuBLAS on GPU, cuSPARSE at high sparsity
3. **Four SHA-256 hashes per case** — input matrix, input vector, vendor output, ROLV output
4. **Perturbation test** — one weight altered by 10⁻³, output hash must change (rules out cached answers)
5. **Signed run hash** — SHA-256 over speedup, timestamp, and hardware fingerprint
ATOL = 0.05 on column-normalised fp64. The correctness check and the speed measurement are the same execution — you cannot skip work to game the clock without failing correctness.
**1,684 / 1,684 GPU PASS · 332 / 332 CPU PASS**
Independently validated by the **University of Miami Frost Institute for Data Science and Computing** — bit-identical SHA-256 hashes across CPU, GPU, and TPU. No commercial relationship. [Validation letter](https://rolv.ai/validation).
---
## Hardware compatibility
**Validated today:** NVIDIA (B200, H200, H100, A100, RTX series, T4, V100) · AMD (MI300X, MI250X, RX 7900) · Intel CPU (MKL, AVX-512) · AMD EPYC · ARM Neoverse · Apple Silicon (M1–M4 Pro) · Google TPU (v4, v5)
**Framework support:** PyTorch · JAX · TensorFlow · ONNX Runtime · TensorRT · vLLM · HuggingFace Transformers
**Works with:** BF16, FP16, FP32, INT8 checkpoints. No retraining. No re-quantisation.
---
## On-premise evaluation
The public benchmark at rolv.ai proves the output. To run ROLV on your own hardware, against your own models, in your own environment, two NDA-gated tiers are available:
**Secure Container** — Hardware-locked Docker container. RolvKey™-authenticated. Processor fingerprint binding at first run; will not execute on any other machine. Optional Intel SGX hardware encryption for regulated environments. Evaluation licence + NDA required.
**Direct Hardware** — Single authenticated file for bare-metal servers and air-gapped environments where Docker is not permitted. Processor-bound binary with live heartbeat attestation. Evaluation licence + NDA required.
Both tiers return cryptographically signed per-run results bound to your processor fingerprint.
**Contact:** rolv@rolv.ai
---
## Citation
```bibtex
@software{heggenhougen_rolv_2026,
author = {Heggenhougen, Rolv E.},
title = {ROLV Primitive©: A Universal Compute Primitive for Sparse AI Inference},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19221455},
url = {https://rolv.ai}
}
```
---
## Links
- **Live benchmark:** [rolv.ai](https://rolv.ai)
- **Paper:** [Zenodo 10.5281/zenodo.19221455](https://doi.org/10.5281/zenodo.19221455)
- **Validation letter:** [rolv.ai/validation](https://rolv.ai/validation)
- **Benchmark API Space:** [rolvai/benchmark](https://huggingface.co/spaces/rolvai/benchmark)
Contact: **rolv@rolv.ai** — a real person reads every message.
---
**ROLV LLC** · Fort Lauderdale, FL · Patent Pending · ROLV Primitive© · RSMT™ · ROLVswitch™ · RolvKey™