Text Generation
Transformers
English
inference
inference-optimization
sparse
sparse-inference
mixture-of-experts
Mixture of Experts
matrix-multiplication
gemm
cuda
rocm
cpu-inference
benchmark
verification
reproducibility
cryptographic-verification
Instructions to use rolvai/rolv-primitive with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rolvai/rolv-primitive with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rolvai/rolv-primitive")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("rolvai/rolv-primitive", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use rolvai/rolv-primitive with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rolvai/rolv-primitive" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rolvai/rolv-primitive", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/rolvai/rolv-primitive
- SGLang
How to use rolvai/rolv-primitive with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rolvai/rolv-primitive" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rolvai/rolv-primitive", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rolvai/rolv-primitive" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rolvai/rolv-primitive", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use rolvai/rolv-primitive with Docker Model Runner:
docker model run hf.co/rolvai/rolv-primitive
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| tags: | |
| - inference | |
| - inference-optimization | |
| - sparse | |
| - sparse-inference | |
| - mixture-of-experts | |
| - moe | |
| - matrix-multiplication | |
| - gemm | |
| - cuda | |
| - rocm | |
| - cpu-inference | |
| - benchmark | |
| - verification | |
| - reproducibility | |
| - cryptographic-verification | |
| # ROLV Primitive© | |
| **A drop-in matmul operator that delivers up to 106× faster AI inference and up to 99% less energy on the same hardware — with bit-identical output.** | |
| [](https://doi.org/10.5281/zenodo.19221455) | |
| [](https://rolv.ai) | |
| [](https://rolv.ai/validation) | |
| --- | |
| ## Run the benchmark from any device — no install | |
| Go to **[rolv.ai](https://rolv.ai)**, sign in, pick a model from the dropdown, click run. Compute happens on our server via the public benchmark API (see [rolvai/benchmark](https://huggingface.co/spaces/rolvai/benchmark)). A SHA-256 signed receipt arrives in your inbox with per-case speedup, energy reduction, correctness check, and a run hash bound to your hardware fingerprint. | |
| Works on a laptop. Works on a Chromebook. Works on a phone. Takes about two minutes. The receipt is cryptographically tied to your run — it cannot be copied from another benchmark, and cannot be fabricated without actually running. | |
| ROLV does not send newsletters. One-time receipt email. That's it. | |
| --- | |
| ## Selected results on real HuggingFace weights | |
| **NVIDIA B200, BF16, TF32 on, 1,000 iterations:** | |
| | Model | Natural sparsity | vs cuBLAS | vs cuSPARSE | Energy reduction | | |
| |---|---|---|---|---| | |
| | Llama-4-Scout | 93.8% | 4.75× | 103× | 79% | | |
| | Mixtral-8×7B | 75.0% | 1.86× | 109× | 46% | | |
| | Qwen3-30B-A3B | 93.8% | 3.43× | 32× | 71% | | |
| | OLMoE-1B-7B (H200) | 87.5% | 2.49× | 43× | 60% | | |
| **Intel i7 laptop (4 cores, 68 GB RAM, MKL baseline):** | |
| | Model / Layer | Sparsity | vs MKL | | |
| |---|---|---| | |
| | Llama-3.2-1B down_proj | 99% | 106.65× | | |
| | Qwen2.5-7B gate_proj | 95% | 59.70× | | |
| | Mistral-7B q_proj | 95% | 21.45× | | |
| Full per-case data with SHA-256 hashes: **[rolv.ai/rolv_benchmarks.pdf](https://rolv.ai/rolv_benchmarks.pdf)** | |
| --- | |
| ## Why it works | |
| Modern AI weight matrices are mostly zero. In an MoE model like Mixtral or DeepSeek-V3, 75–97% of weights are architecturally inactive for any given token — guaranteed by the router, known before computation. Standard libraries compute them anyway. ROLV identifies the non-zero structure at load time and restricts computation to live elements only. Same BLAS/tensor-core primitive, same output — on a matrix proportional to the non-zero fraction. Results are placed in the correct positions of the full output tensor. Final output is bit-identical to the full dense operation. | |
| Inner mechanism is Patent Pending. | |
| --- | |
| ## Verification protocol | |
| Every benchmark case is produced with five independent verification layers: | |
| 1. **Real HuggingFace weights** — downloaded from public repositories, no synthetic matrices | |
| 2. **Vendor baseline** — Intel MKL on CPU, cuBLAS on GPU, cuSPARSE at high sparsity | |
| 3. **Four SHA-256 hashes per case** — input matrix, input vector, vendor output, ROLV output | |
| 4. **Perturbation test** — one weight altered by 10⁻³, output hash must change (rules out cached answers) | |
| 5. **Signed run hash** — SHA-256 over speedup, timestamp, and hardware fingerprint | |
| ATOL = 0.05 on column-normalised fp64. The correctness check and the speed measurement are the same execution — you cannot skip work to game the clock without failing correctness. | |
| **1,684 / 1,684 GPU PASS · 332 / 332 CPU PASS** | |
| Independently validated by the **University of Miami Frost Institute for Data Science and Computing** — bit-identical SHA-256 hashes across CPU, GPU, and TPU. No commercial relationship. [Validation letter](https://rolv.ai/validation). | |
| --- | |
| ## Hardware compatibility | |
| **Validated today:** NVIDIA (B200, H200, H100, A100, RTX series, T4, V100) · AMD (MI300X, MI250X, RX 7900) · Intel CPU (MKL, AVX-512) · AMD EPYC · ARM Neoverse · Apple Silicon (M1–M4 Pro) · Google TPU (v4, v5) | |
| **Framework support:** PyTorch · JAX · TensorFlow · ONNX Runtime · TensorRT · vLLM · HuggingFace Transformers | |
| **Works with:** BF16, FP16, FP32, INT8 checkpoints. No retraining. No re-quantisation. | |
| --- | |
| ## On-premise evaluation | |
| The public benchmark at rolv.ai proves the output. To run ROLV on your own hardware, against your own models, in your own environment, two NDA-gated tiers are available: | |
| **Secure Container** — Hardware-locked Docker container. RolvKey™-authenticated. Processor fingerprint binding at first run; will not execute on any other machine. Optional Intel SGX hardware encryption for regulated environments. Evaluation licence + NDA required. | |
| **Direct Hardware** — Single authenticated file for bare-metal servers and air-gapped environments where Docker is not permitted. Processor-bound binary with live heartbeat attestation. Evaluation licence + NDA required. | |
| Both tiers return cryptographically signed per-run results bound to your processor fingerprint. | |
| **Contact:** rolv@rolv.ai | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @software{heggenhougen_rolv_2026, | |
| author = {Heggenhougen, Rolv E.}, | |
| title = {ROLV Primitive©: A Universal Compute Primitive for Sparse AI Inference}, | |
| year = {2026}, | |
| publisher = {Zenodo}, | |
| doi = {10.5281/zenodo.19221455}, | |
| url = {https://rolv.ai} | |
| } | |
| ``` | |
| --- | |
| ## Links | |
| - **Live benchmark:** [rolv.ai](https://rolv.ai) | |
| - **Paper:** [Zenodo 10.5281/zenodo.19221455](https://doi.org/10.5281/zenodo.19221455) | |
| - **Validation letter:** [rolv.ai/validation](https://rolv.ai/validation) | |
| - **Benchmark API Space:** [rolvai/benchmark](https://huggingface.co/spaces/rolvai/benchmark) | |
| Contact: **rolv@rolv.ai** — a real person reads every message. | |
| --- | |
| **ROLV LLC** · Fort Lauderdale, FL · Patent Pending · ROLV Primitive© · RSMT™ · ROLVswitch™ · RolvKey™ | |