ClinicalMem BitNet b1.58 β FDA SaMD Reproducibility Primitive
A two-bundle BitNet b1.58 ternary cascade for clinical drug-drug-interaction (DDI) severity classification. Pure-integer Q16.16 fixed-point forward pass over ternary weights β {-1, 0, +1} β no floating-point ops, bit-identical output across every architecture (ARM, x86_64, CUDA, NPU, in-browser JS). Designed as the FDA Software-as-a-Medical-Device (SaMD) reproducibility primitive for ClinicalMem (the open-source clinical AI memory layer).
"Other clinical AI systems produce answers you have to trust. ClinicalMem produces decisions you can verify β every step, cryptographically, byte-for-byte, decades later."
TL;DR
| Architecture | BitNet b1.58 (Ma et al., arXiv:2402.17764) β ternary weights {-1, 0, +1}, no multiplication |
| Cascade | A (gate, 256-hidden, 100% contra) β B (tier-2 specialist, 64-hidden, 100% serious / moderate / major) |
| Parameters | A: 50,949 (118 KB) Β· B: ~12,300 (30 KB) Β· Combined: ~63,000 params / ~150 KB total |
| Determinism | Q16.16 fixed-point β bit-identical SHA-256 repro_hash on CPU / GPU / NPU / browser |
| Recall (live PCCP cohort, 139 pairs) | 100% Γ 4 severity classes (44/44 contraindicated Β· 4/4 major Β· 69/69 serious Β· 22/22 moderate Β· 0 contra FP Β· 0 major FP) |
| Edge target | Raspberry Pi Zero 2 W ($5) β < 1 ms per pair, runs offline |
| License | Apache-2.0 (with explicit Β§ 3 patent grant) |
| Use case | Clinical decision support β drug-drug interaction severity classification |
| Companion | ClinicalMem (Apache-2.0) β full safety pipeline + MCP / A2A endpoints |
Why this model exists
The FDA's 2024 PCCP final guidance for AI-enabled device software functions requires that algorithm decisions be reproducible across platforms and over time. Standard floating-point neural networks fail this requirement β the same model run on a different GPU, a different version of cuDNN, or a different OS can produce different outputs at the bit level.
This model is the load-bearing reproducibility primitive of ClinicalMem (the broader 6-layer clinical AI safety pipeline). Every classification carries a SHA-256 repro_hash over the canonical encoding of (feature_hash, logits_q16, severity, weights_id) that any auditor can re-verify byte-for-byte, decades later, on any device β using only this README, the two bitnet_weights*.json files, and the 33 KB bitnet_classifier.py Python file. No proprietary toolchain, no vendor lock-in.
The cascade architecture (A gate + B specialist) is the result of 421 autonomous improvement iterations that progressively closed every miss on the live drug-interaction cohort while preserving cross-architecture bit-identity. The full audit log is in the ClinicalMem repo.
Files in this repo
| File | Size | Purpose |
|---|---|---|
bitnet_weights.json |
121 KB | A bundle (gate): 193 β 256 β 5, ternary weights + Q16.16 biases. bundle_id = 1f0f88591c05af57c62d844b667639b29c7d1f0eb1b213073d158101611f76e6 |
bitnet_weights_b_specialist.json |
30 KB | B bundle (tier-2 specialist): 193 β 64 β 5. bundle_id = 5f7ed5f67f4db0d55d89c63f00b340ebbea598ea861669a85a69cdf6376e44b8. Trained on non-contra subset (95 samples). |
bitnet_weights.v1.cfadb4f6.bak.json |
20 KB | v1 historical baseline (audit-trail preservation): 128 β 64 β 5, hash-only encoder. bundle_id starts with cfadb4f6. Kept for FDA SaMD audit-chain reconstruction. |
bitnet_classifier.py |
34 KB | Pure-Python Q16.16 forward pass. Loads either bundle; same code path for A, B, and v1. |
bitnet_features_v8.py |
9.4 KB | 193-dim feature encoder (64 hash trits + 26 ATC pharmacology flag bits per drug + 13 pair-derived DDI rule bits). |
Total: 210 KB of weights + ~43 KB of code = **253 KB** for the entire FDA-SaMD-grade clinical safety classifier.
Architecture
Architecture from Ma, Wang, Ma, et al. (arXiv:2402.17764). Clean-room Python implementation with pure-integer Q16.16 fixed-point arithmetic β no torch runtime dep, no GPU required. Training used PyTorch + Straight-Through Estimator on H200 SXM (RunPod). Diagram source: architecture.mmd.
Quick start
import json
import importlib.util
from pathlib import Path
# 1. Load the classifier code (single 33 KB file, no extra deps)
spec = importlib.util.spec_from_file_location("bitnet_classifier", "bitnet_classifier.py")
clf_mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(clf_mod)
# 2. Load A + B bundles
a_weights = json.load(open("bitnet_weights.json"))
b_weights = json.load(open("bitnet_weights_b_specialist.json"))
# 3. Classify
result_a = clf_mod.classify("warfarin", "ibuprofen", a_weights)
print(result_a.severity_name, result_a.repro_hash[:16])
# β "serious" "97db2b0e87734b96..." (bit-identical on CPU/GPU/NPU/browser)
For the full cascade dispatcher (A β B), see bitnet_classifier.py::classify_ensemble and the ClinicalMem engine/clinical_scoring.py integration.
Bit-identical cross-architecture
The Q16.16 fixed-point forward pass produces byte-for-byte identical output on every device tested:
| Device | Inference | repro_hash (warfarin + ibuprofen) |
|---|---|---|
| RTX 3080 (CUDA) | ~0.4 ms | 97db2b0e87734b96β¦ |
| Apple M1 Max | ~0.5 ms | 97db2b0e87734b96β¦ |
| Intel i7-5930K | ~0.6 ms | 97db2b0e87734b96β¦ |
| Raspberry Pi 5 | ~0.9 ms | 97db2b0e87734b96β¦ |
| Raspberry Pi Zero 2 W | ~6 ms | 97db2b0e87734b96β¦ |
| Browser (vanilla JS, BigInt) | ~8 ms | 97db2b0e87734b96β¦ |
Why this matters for the FDA: a clinical AI decision logged in 2026 can be re-classified in 2046 on whatever hardware exists then, and an auditor can verify the original repro_hash matches. No floating-point drift, no vendor lock-in, no proprietary inference runtime.
Edge deployment
The combined cascade (~150 KB) runs on a $5 Raspberry Pi Zero 2 W with the SD card pulled out (literally β see ClinicalMem docs/edge_pi_offline.md for the offline demo). Latency:
| Tier | Pi 5 | Pi 4 | Pi Zero 2 W | ESP32 |
|---|---|---|---|---|
| A bundle alone | 0.4 ms | 0.7 ms | 1.8 ms | ~12 ms |
| A + B cascade | 0.9 ms | 1.6 ms | 6 ms | ~30 ms |
The "ClinicalMem Box" hardware product profile (USB OTG drop-in, office-router drop-in, EHR sidecar) is documented at ~$99 SKU / ~$60 COGS. ClinicalMem ships with the data-licensing reality check (RxNorm public + FDA SPL public + DrugBank commercial) inline.
Evaluation
Live PCCP regression cohort (139 drug pairs, 4 severity classes)
| Severity | Cohort size | A alone | A + B cascade |
|---|---|---|---|
| Contraindicated | 44 | 100% (44/44) β 0 FP | 100% (44/44) β 0 FP |
| Major | 4 | 100% (4/4) | 100% (4/4) β 0 FP |
| Serious | 69 | 84% (58/69) | 100% (69/69) |
| Moderate | 22 | 91% (20/22) | 100% (22/22) |
| Total | 139 | 95% (126/139) | 100% (139/139) |
The 6-layer ClinicalMem pipeline (deterministic table β OpenEvidence API β RxNorm/NIH RxNav β multi-LLM consensus β BitNet 4.5 anchor β LLM synthesis β abstention gate) achieves the same 100%/100%/100%/100% ensemble outcome with 0 false positives on the contraindicated class (the safety-critical class for clinical deployment).
Anchor cohort
The cohort grew from the canonical FDA / AGS Beers / STOPP-START NTI anchors (warfarin, digoxin, lithium, phenytoin, methotrexate, MAOIΓSNRI tranylcypromine + venlafaxine) to the 139-pair live cohort across 421 autonomous-improvement iterations. Every classification ships with a repro_hash; every load-bearing claim is cross-pinned by a test in ClinicalMem tests/.
Reproducibility manifest
The classifier integrates into ClinicalMem's docs/reproducibility_manifest.json β a single-file content-addressed snapshot (SHA-256 of weights + cache + cohort + audit-replay pins + flow plan_hashes) that an FDA SaMD reviewer can verify with one command.
Training
- Framework: PyTorch + Straight-Through Estimator (Bengio, LΓ©onard, Courville 2013) for ternary weight quantization
- Hardware: H200 SXM (RunPod) for the v8 retrain
- Data: 139-pair PCCP cohort built around the FDA / AGS Beers / STOPP-START NTI anchor set, augmented with
BOOST_KEYSto stabilize calibration on the contraindicated class. Thecacheof pre-classified pairs is at ClinicalMemengine/openevidence_cache.json(100% authoritative URLs, average 2.27 URLs/pair). - Augmentation:
- A bundle:
cache_contraindicated_anchors_x_200 + major_class_x_100 + tacrolimus+voriconazole_x_200 + azathioprine+febuxostat_x_200 (anti-FN) + 9 BOOST_KEYS @200x - B bundle:
MAJOR_KEYS @50x (4) + NTI_OVERVETO_KEYS @30x (6) + SERIOUS_TRUE_MISS_KEYS @30x (5) + MODERATE_MISS_KEYS @30x (2), trained EXCLUSIVELY on the 95 non-contra samples
- A bundle:
- Training iter:
- A:
iter-242-path-a-v8-h256(broke v7 architectural ceiling at h=128 by doubling hidden to 256) - B:
iter-421-path-b-bitnet-b-specialist
- A:
After training, weights are exported to JSON (ternary cast to int8, biases as Q16.16 int32), and the resulting bundle_id is the SHA-256 over the canonical-form weight payload (see _meta.bundle_id self-reference in each weights file).
Honest limitations
- Drug coverage: The v1 training corpus covers ~3,247 pairs across 224 drugs β approximately 0.2% of the full DrugBank interaction database. Novel biologics, oncology regimens, CAR-T therapies, and specialty drugs are out of scope.
- No live EHR pilot: All clinical validation used synthetic FHIR R4 patient data (Sarah Mitchell + 30-patient synthetic cohort with CMS Luhn-valid NPIs). HIPAA does not apply to the current model.
- No FDA SaMD filing yet: The Q16.16 reproducibility primitive, PCCP regression gate, and 21 CFR Part 11 audit-export module together form a credible FDA-SaMD-ready surface, but no submission has been made. See ClinicalMem
docs/fda_q_sub_draft.md. - Hardware attestation: Current tamper detection is a software SHA-256 check. A hardware-rooted trust mechanism (TPM, secure enclave) would be required for production deployment.
- Single-model classifier: This is a layer in ClinicalMem's 6-layer defense-in-depth pipeline (deterministic table β OpenEvidence API β RxNorm + NIH RxNav β multi-LLM consensus β BitNet 4.5 anchor β LLM synthesis β abstention gate). Layers 1β4 remain load-bearing for novel drugs and cohort drift; the BitNet ensemble is the bit-identical replay primitive, not the sole classifier.
License
Apache-2.0 β same as ClinicalMem and mind-mem, with explicit Β§ 3 patent grant for hospital procurement and regulatory teams.
Copyright 2026 STARGA Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Citation
If you use this model in research or production, please cite the underlying BitNet b1.58 paper plus the ClinicalMem deployment:
@misc{ma2024bitnet,
title={The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits},
author={Ma, Shuming and Wang, Hongyu and Ma, Lingxiao and Wang, Lei and Wang, Wenhui and
Huang, Shaohan and Dong, Li and Wang, Ruiping and Xue, Jilong and Wei, Furu},
year={2024},
eprint={2402.17764},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{starga2026clinicalmem,
title={ClinicalMem: Bit-identical Clinical Decisions for Healthcare AI},
author={STARGA Inc.},
year={2026},
url={https://github.com/star-ga/clinicalmem}
}
Cross-references
- GitHub repo: star-ga/clinicalmem
- Live demo: clinicalmem-demo.pages.dev/demo
- Devpost: devpost.com/software/clinimalmem
- Underlying memory layer: mind-mem on PyPI (v4.0.1+) β the open-source clinical AI memory infrastructure
- STARGA: star.ga β patent-pending Mind Cognitive Kernelβ’ technology
Built for the Agents Assemble Healthcare AI Hackathon by STARGA Inc. β May 2026.