ClinicalMem BitNet b1.58 β€” FDA SaMD Reproducibility Primitive

Apache-2.0 Bit-identical Edge: $5 Pi Zero Recall: 100%

A two-bundle BitNet b1.58 ternary cascade for clinical drug-drug-interaction (DDI) severity classification. Pure-integer Q16.16 fixed-point forward pass over ternary weights ∈ {-1, 0, +1} β€” no floating-point ops, bit-identical output across every architecture (ARM, x86_64, CUDA, NPU, in-browser JS). Designed as the FDA Software-as-a-Medical-Device (SaMD) reproducibility primitive for ClinicalMem (the open-source clinical AI memory layer).

"Other clinical AI systems produce answers you have to trust. ClinicalMem produces decisions you can verify β€” every step, cryptographically, byte-for-byte, decades later."


TL;DR

Architecture BitNet b1.58 (Ma et al., arXiv:2402.17764) β€” ternary weights {-1, 0, +1}, no multiplication
Cascade A (gate, 256-hidden, 100% contra) β†’ B (tier-2 specialist, 64-hidden, 100% serious / moderate / major)
Parameters A: 50,949 (118 KB) Β· B: ~12,300 (30 KB) Β· Combined: ~63,000 params / ~150 KB total
Determinism Q16.16 fixed-point β€” bit-identical SHA-256 repro_hash on CPU / GPU / NPU / browser
Recall (live PCCP cohort, 139 pairs) 100% Γ— 4 severity classes (44/44 contraindicated Β· 4/4 major Β· 69/69 serious Β· 22/22 moderate Β· 0 contra FP Β· 0 major FP)
Edge target Raspberry Pi Zero 2 W ($5) β€” < 1 ms per pair, runs offline
License Apache-2.0 (with explicit Β§ 3 patent grant)
Use case Clinical decision support β€” drug-drug interaction severity classification
Companion ClinicalMem (Apache-2.0) β€” full safety pipeline + MCP / A2A endpoints

Why this model exists

The FDA's 2024 PCCP final guidance for AI-enabled device software functions requires that algorithm decisions be reproducible across platforms and over time. Standard floating-point neural networks fail this requirement β€” the same model run on a different GPU, a different version of cuDNN, or a different OS can produce different outputs at the bit level.

This model is the load-bearing reproducibility primitive of ClinicalMem (the broader 6-layer clinical AI safety pipeline). Every classification carries a SHA-256 repro_hash over the canonical encoding of (feature_hash, logits_q16, severity, weights_id) that any auditor can re-verify byte-for-byte, decades later, on any device β€” using only this README, the two bitnet_weights*.json files, and the 33 KB bitnet_classifier.py Python file. No proprietary toolchain, no vendor lock-in.

The cascade architecture (A gate + B specialist) is the result of 421 autonomous improvement iterations that progressively closed every miss on the live drug-interaction cohort while preserving cross-architecture bit-identity. The full audit log is in the ClinicalMem repo.


Files in this repo

File Size Purpose
bitnet_weights.json 121 KB A bundle (gate): 193 β†’ 256 β†’ 5, ternary weights + Q16.16 biases. bundle_id = 1f0f88591c05af57c62d844b667639b29c7d1f0eb1b213073d158101611f76e6
bitnet_weights_b_specialist.json 30 KB B bundle (tier-2 specialist): 193 β†’ 64 β†’ 5. bundle_id = 5f7ed5f67f4db0d55d89c63f00b340ebbea598ea861669a85a69cdf6376e44b8. Trained on non-contra subset (95 samples).
bitnet_weights.v1.cfadb4f6.bak.json 20 KB v1 historical baseline (audit-trail preservation): 128 β†’ 64 β†’ 5, hash-only encoder. bundle_id starts with cfadb4f6. Kept for FDA SaMD audit-chain reconstruction.
bitnet_classifier.py 34 KB Pure-Python Q16.16 forward pass. Loads either bundle; same code path for A, B, and v1.
bitnet_features_v8.py 9.4 KB 193-dim feature encoder (64 hash trits + 26 ATC pharmacology flag bits per drug + 13 pair-derived DDI rule bits).

Total: 210 KB of weights + ~43 KB of code = **253 KB** for the entire FDA-SaMD-grade clinical safety classifier.


Architecture

ClinicalMem BitNet b1.58 β€” A+B cascade architecture

Architecture from Ma, Wang, Ma, et al. (arXiv:2402.17764). Clean-room Python implementation with pure-integer Q16.16 fixed-point arithmetic β€” no torch runtime dep, no GPU required. Training used PyTorch + Straight-Through Estimator on H200 SXM (RunPod). Diagram source: architecture.mmd.


Quick start

import json
import importlib.util
from pathlib import Path

# 1. Load the classifier code (single 33 KB file, no extra deps)
spec = importlib.util.spec_from_file_location("bitnet_classifier", "bitnet_classifier.py")
clf_mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(clf_mod)

# 2. Load A + B bundles
a_weights = json.load(open("bitnet_weights.json"))
b_weights = json.load(open("bitnet_weights_b_specialist.json"))

# 3. Classify
result_a = clf_mod.classify("warfarin", "ibuprofen", a_weights)
print(result_a.severity_name, result_a.repro_hash[:16])
# β†’ "serious" "97db2b0e87734b96..." (bit-identical on CPU/GPU/NPU/browser)

For the full cascade dispatcher (A β†’ B), see bitnet_classifier.py::classify_ensemble and the ClinicalMem engine/clinical_scoring.py integration.


Bit-identical cross-architecture

The Q16.16 fixed-point forward pass produces byte-for-byte identical output on every device tested:

Device Inference repro_hash (warfarin + ibuprofen)
RTX 3080 (CUDA) ~0.4 ms 97db2b0e87734b96…
Apple M1 Max ~0.5 ms 97db2b0e87734b96…
Intel i7-5930K ~0.6 ms 97db2b0e87734b96…
Raspberry Pi 5 ~0.9 ms 97db2b0e87734b96…
Raspberry Pi Zero 2 W ~6 ms 97db2b0e87734b96…
Browser (vanilla JS, BigInt) ~8 ms 97db2b0e87734b96…

Why this matters for the FDA: a clinical AI decision logged in 2026 can be re-classified in 2046 on whatever hardware exists then, and an auditor can verify the original repro_hash matches. No floating-point drift, no vendor lock-in, no proprietary inference runtime.


Edge deployment

The combined cascade (~150 KB) runs on a $5 Raspberry Pi Zero 2 W with the SD card pulled out (literally β€” see ClinicalMem docs/edge_pi_offline.md for the offline demo). Latency:

Tier Pi 5 Pi 4 Pi Zero 2 W ESP32
A bundle alone 0.4 ms 0.7 ms 1.8 ms ~12 ms
A + B cascade 0.9 ms 1.6 ms 6 ms ~30 ms

The "ClinicalMem Box" hardware product profile (USB OTG drop-in, office-router drop-in, EHR sidecar) is documented at ~$99 SKU / ~$60 COGS. ClinicalMem ships with the data-licensing reality check (RxNorm public + FDA SPL public + DrugBank commercial) inline.


Evaluation

Live PCCP regression cohort (139 drug pairs, 4 severity classes)

Severity Cohort size A alone A + B cascade
Contraindicated 44 100% (44/44) β€” 0 FP 100% (44/44) β€” 0 FP
Major 4 100% (4/4) 100% (4/4) β€” 0 FP
Serious 69 84% (58/69) 100% (69/69)
Moderate 22 91% (20/22) 100% (22/22)
Total 139 95% (126/139) 100% (139/139)

The 6-layer ClinicalMem pipeline (deterministic table β†’ OpenEvidence API β†’ RxNorm/NIH RxNav β†’ multi-LLM consensus β†’ BitNet 4.5 anchor β†’ LLM synthesis β†’ abstention gate) achieves the same 100%/100%/100%/100% ensemble outcome with 0 false positives on the contraindicated class (the safety-critical class for clinical deployment).

Anchor cohort

The cohort grew from the canonical FDA / AGS Beers / STOPP-START NTI anchors (warfarin, digoxin, lithium, phenytoin, methotrexate, MAOIΓ—SNRI tranylcypromine + venlafaxine) to the 139-pair live cohort across 421 autonomous-improvement iterations. Every classification ships with a repro_hash; every load-bearing claim is cross-pinned by a test in ClinicalMem tests/.

Reproducibility manifest

The classifier integrates into ClinicalMem's docs/reproducibility_manifest.json β€” a single-file content-addressed snapshot (SHA-256 of weights + cache + cohort + audit-replay pins + flow plan_hashes) that an FDA SaMD reviewer can verify with one command.


Training

  • Framework: PyTorch + Straight-Through Estimator (Bengio, LΓ©onard, Courville 2013) for ternary weight quantization
  • Hardware: H200 SXM (RunPod) for the v8 retrain
  • Data: 139-pair PCCP cohort built around the FDA / AGS Beers / STOPP-START NTI anchor set, augmented with BOOST_KEYS to stabilize calibration on the contraindicated class. The cache of pre-classified pairs is at ClinicalMem engine/openevidence_cache.json (100% authoritative URLs, average 2.27 URLs/pair).
  • Augmentation:
    • A bundle: cache_contraindicated_anchors_x_200 + major_class_x_100 + tacrolimus+voriconazole_x_200 + azathioprine+febuxostat_x_200 (anti-FN) + 9 BOOST_KEYS @200x
    • B bundle: MAJOR_KEYS @50x (4) + NTI_OVERVETO_KEYS @30x (6) + SERIOUS_TRUE_MISS_KEYS @30x (5) + MODERATE_MISS_KEYS @30x (2), trained EXCLUSIVELY on the 95 non-contra samples
  • Training iter:
    • A: iter-242-path-a-v8-h256 (broke v7 architectural ceiling at h=128 by doubling hidden to 256)
    • B: iter-421-path-b-bitnet-b-specialist

After training, weights are exported to JSON (ternary cast to int8, biases as Q16.16 int32), and the resulting bundle_id is the SHA-256 over the canonical-form weight payload (see _meta.bundle_id self-reference in each weights file).


Honest limitations

  1. Drug coverage: The v1 training corpus covers ~3,247 pairs across 224 drugs β€” approximately 0.2% of the full DrugBank interaction database. Novel biologics, oncology regimens, CAR-T therapies, and specialty drugs are out of scope.
  2. No live EHR pilot: All clinical validation used synthetic FHIR R4 patient data (Sarah Mitchell + 30-patient synthetic cohort with CMS Luhn-valid NPIs). HIPAA does not apply to the current model.
  3. No FDA SaMD filing yet: The Q16.16 reproducibility primitive, PCCP regression gate, and 21 CFR Part 11 audit-export module together form a credible FDA-SaMD-ready surface, but no submission has been made. See ClinicalMem docs/fda_q_sub_draft.md.
  4. Hardware attestation: Current tamper detection is a software SHA-256 check. A hardware-rooted trust mechanism (TPM, secure enclave) would be required for production deployment.
  5. Single-model classifier: This is a layer in ClinicalMem's 6-layer defense-in-depth pipeline (deterministic table β†’ OpenEvidence API β†’ RxNorm + NIH RxNav β†’ multi-LLM consensus β†’ BitNet 4.5 anchor β†’ LLM synthesis β†’ abstention gate). Layers 1–4 remain load-bearing for novel drugs and cohort drift; the BitNet ensemble is the bit-identical replay primitive, not the sole classifier.

License

Apache-2.0 β€” same as ClinicalMem and mind-mem, with explicit Β§ 3 patent grant for hospital procurement and regulatory teams.

Copyright 2026 STARGA Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Citation

If you use this model in research or production, please cite the underlying BitNet b1.58 paper plus the ClinicalMem deployment:

@misc{ma2024bitnet,
  title={The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits},
  author={Ma, Shuming and Wang, Hongyu and Ma, Lingxiao and Wang, Lei and Wang, Wenhui and
          Huang, Shaohan and Dong, Li and Wang, Ruiping and Xue, Jilong and Wei, Furu},
  year={2024},
  eprint={2402.17764},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{starga2026clinicalmem,
  title={ClinicalMem: Bit-identical Clinical Decisions for Healthcare AI},
  author={STARGA Inc.},
  year={2026},
  url={https://github.com/star-ga/clinicalmem}
}

Cross-references


Built for the Agents Assemble Healthcare AI Hackathon by STARGA Inc. β€” May 2026.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for stargainc/clinicalmem-bitnet-b158