Instructions to use SZLHOLDINGS/governed-inference-meter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use SZLHOLDINGS/governed-inference-meter with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("SZLHOLDINGS/governed-inference-meter") - Notebooks
- Google Colab
- Kaggle
governed-inference-meter
Energy-metered, governed inference receipts. A lightweight, dependency-light Python utility (and Hugging Face universal kernel) that wraps any inference call and emits a governed, energy-metered, tamper-evident receipt:
- measures GPU energy via NVIDIA NVML (power/energy readback) integrated over wall-time โ joules,
- computes tokens-per-joule,
- runs a pluggable, advisory policy gate (allow/deny; defaults to allow),
- and emits a SHA-256 hash-chained JSON receipt so a sequence of calls is independently auditable.
It is the energy + governance counterpart to
SZLHOLDINGS/szl-governed-norm
โ provenance at the inference boundary, in the spirit of the
a11oy governed-AI platform: receipts, not capability
claims.
Why this exists. Browse the Kernel Hub and you find performance kernels โ attention, activations, GEMM, norms. There is no energy-metering + governance kernel. Teams running inference in sovereign, regulated, or cost/carbon-sensitive contexts measure tokens/joule and keep audit trails by hand. This utility does both in one wrapped call, and degrades honestly when no GPU energy readback is available.
Honest scope (read this first)
This project follows a strict honesty doctrine. ฮ (the governance trust quantity) is Conjecture 1 โ advisory, not a theorem. Trust is never 100%.
- MEASURED only with NVML. Energy is real only when NVML is present and
grants power/energy readback. Without it the receipt is labeled
mode="unmeasured"andjoules/tokens_per_joulearenull. We never fabricate a joule figure. - Board-level power. NVML reports whole-board power (compute die + memory + losses). We report what the hardware reports and say so. No modeling, no scaling factors.
- The policy gate is advisory and host-enforced. It records an allow/deny
decision into the receipt. It does not, and cannot, enforce anything by
itself โ your host must actually skip a denied call. The bundled
meter()wrapper does fail-safe (it will not execute a denied call), but downstream enforcement is still your responsibility. - The receipt digest is an integrity fingerprint, not a signature. It is a SHA-256 over the canonical record body and makes tampering evident. It does not prove authorship. Cryptographic signing (e.g. DSSE/Sigstore) is a separate, out-of-band concern, intentionally not done here.
- This is a metering + receipt utility, not a safety guarantee.
Install / load
From the Hugging Face Hub (universal kernel โ runs on CPU and CUDA):
from kernels import get_kernel
gim = get_kernel("SZLHOLDINGS/governed-inference-meter")
From PyPI-style source (zero hard dependencies; add pynvml for real
energy):
pip install kernels # to load via get_kernel
# real GPU energy measurement additionally needs NVML bindings:
pip install pynvml
Usage
from kernels import get_kernel
gim = get_kernel("SZLHOLDINGS/governed-inference-meter")
print(gim.__version__)
print(gim.capability_report()) # what energy measurement is possible here
# Wrap ANY inference callable. You tell the meter the token counts.
def run(prompt):
# ... your real model.generate(...) call here ...
return "the model's response text"
receipt, output = gim.meter(
run, args=("hello",),
model="my-llm-7b",
tokens_in=2, tokens_out=7,
)
print(receipt["mode"]) # 'measured-energy' | 'measured-power-integral' | 'unmeasured'
print(receipt["joules"]) # float, or None when unmeasured
print(receipt["tokens_per_joule"]) # float, or None when unmeasured
print(receipt["policy_decision"]) # 'allow' | 'deny'
print(receipt["digest"]) # SHA-256 over the canonical record body
print(gim.receipt_verify()) # (ok, depth, first_break_seq) over the chain
A custom policy gate (advisory)
def my_gate(ctx):
# ctx has model, tokens_in, tokens_out, args, kwargs, ts
if ctx["tokens_in"] > 8192:
return ("deny", "prompt exceeds governed token budget")
return ("allow", "within budget")
receipt, output = gim.meter(run, args=("hi",), model="m",
tokens_in=2, tokens_out=7, policy=my_gate)
A gate may return a PolicyResult, a (decision, reason) tuple, a bool, or a
string. It runs fail-closed: if your gate raises, the call is denied with
the exception text as the reason โ a buggy policy can never silently allow.
Per-request chain (no global-state contention)
chain = gim.ReceiptChain()
gim.meter(run, args=("a",), model="m", tokens_in=1, tokens_out=4, chain=chain)
gim.meter(run, args=("b",), model="m", tokens_in=1, tokens_out=6, chain=chain)
print(chain.verify()) # tamper-evident over YOUR chain only
print(chain.to_jsonl()) # export the chain for offline audit
MEASURED vs. unmeasured โ what you get
| Environment | mode |
joules |
tokens_per_joule |
|---|---|---|---|
NVIDIA GPU with energy counter (nvmlDeviceGetTotalEnergyConsumption) |
measured-energy |
hardware accumulator delta | computed |
NVIDIA GPU, power readback only (nvmlDeviceGetPowerUsage) |
measured-power-integral |
trapezoidal integral of power samples | computed |
No GPU / no driver / no permission / no pynvml |
unmeasured |
null |
null |
Sample receipt โ unmeasured (illustrative; this build env has no GPU)
SAMPLE / illustrative. Produced on a CPU-only box. Because NVML is unavailable, energy is honestly
unmeasuredandjoulesisnullโ exactly the honest-degrade behavior. No energy number is invented.
{
"seq": 0,
"model": "my-llm-7b",
"tokens_in": 2,
"tokens_out": 7,
"mode": "unmeasured",
"joules": null,
"wall_seconds": 0.004182,
"tokens_per_joule": null,
"policy_decision": "allow",
"policy_reason": "default allow_all gate (no policy configured)",
"prev": "0000000000000000000000000000000000000000000000000000000000000000",
"digest": "<sha256 of the canonical body>",
"ts": 1750000000.0
}
On a real NVIDIA GPU the same call would carry e.g.
"mode": "measured-energy", a positive "joules", and a computed
"tokens_per_joule". We do not print example GPU numbers here because this
build environment cannot measure them, and inventing them would violate the
honesty doctrine. Run gim.selfcheck() on your own hardware to see your numbers.
Self-test
import governed_inference_meter as gim
print(gim.selfcheck()) # functional check (NOT a benchmark); no fabricated energy
selfcheck() runs a metered allow call, a denied call (verifying it does not
execute), checks tokens/joule honesty, verifies the hash chain, and confirms
that mutating a past record is detected. It requires no GPU.
What's in the repo
build.toml # Kernel Hub universal-kernel manifest
build/torch-universal/governed_inference_meter/
__init__.py # meter() / metered() wrappers, selfcheck(), accessors
_energy.py # NVML energy + power-integral measurement, honest degrade
_receipt.py # SHA-256 hash-chained, tamper-evident receipts
_policy.py # advisory policy gate (allow_all default, fail-closed)
metadata.json
pyproject.toml # also pip-installable from source
tests/test_meter.py # runs on CPU, no GPU needed
LICENSE # Apache-2.0
Doctrine & honesty disclaimer
SZL Holdings ยท governed, energy-metered inference receipts ยท MEASURED only with NVML ยท the policy gate is advisory (host-enforced) ยท ฮ = Conjecture 1 (advisory, not a theorem) ยท trust never 100% ยท honesty over checklist. This is a metering + receipt utility, not a safety guarantee. No fabricated benchmarks; energy is reported only when physically measured.
License: Apache-2.0 ยท Maintainer: Stephen Lutar stephenlutar2@gmail.com ยท Platform: a11oy.net โ the governed-inference substrate for hard missions.
- Downloads last month
- -