ClinicalMem BitNet b1.58 — FDA SaMD Reproducibility Primitive

A two-bundle BitNet b1.58 ternary cascade for clinical drug-drug-interaction (DDI) severity classification. Pure-integer Q16.16 fixed-point forward pass over ternary weights ∈ {-1, 0, +1} — no floating-point ops, bit-identical output across every architecture (ARM, x86_64, CUDA, NPU, in-browser JS). Designed as the FDA Software-as-a-Medical-Device (SaMD) reproducibility primitive for ClinicalMem (the open-source clinical AI memory layer).

"Other clinical AI systems produce answers you have to trust. ClinicalMem produces decisions you can verify — every step, cryptographically, byte-for-byte, decades later."

TL;DR


Architecture	BitNet b1.58 (Ma et al., arXiv:2402.17764) — ternary weights {-1, 0, +1}, no multiplication
Cascade	A (gate, 256-hidden, 100% contra) → B (tier-2 specialist, 64-hidden, 100% serious / moderate / major)
Parameters	A: 50,949 (118 KB) · B: ~12,300 (30 KB) · Combined: ~63,000 params / ~150 KB total
Determinism	Q16.16 fixed-point — bit-identical SHA-256 `repro_hash` on CPU / GPU / NPU / browser
Recall (live PCCP cohort, 139 pairs)	100% × 4 severity classes (44/44 contraindicated · 4/4 major · 69/69 serious · 22/22 moderate · 0 contra FP · 0 major FP)
Edge target	Raspberry Pi Zero 2 W ($5) — `< 1 ms` per pair, runs offline
License	Apache-2.0 (with explicit § 3 patent grant)
Use case	Clinical decision support — drug-drug interaction severity classification
Companion	ClinicalMem (Apache-2.0) — full safety pipeline + MCP / A2A endpoints

Why this model exists

The FDA's 2024 PCCP final guidance for AI-enabled device software functions requires that algorithm decisions be reproducible across platforms and over time. Standard floating-point neural networks fail this requirement — the same model run on a different GPU, a different version of cuDNN, or a different OS can produce different outputs at the bit level.

This model is the load-bearing reproducibility primitive of ClinicalMem (the broader 6-layer clinical AI safety pipeline). Every classification carries a SHA-256 repro_hash over the canonical encoding of (feature_hash, logits_q16, severity, weights_id) that any auditor can re-verify byte-for-byte, decades later, on any device — using only this README, the two bitnet_weights*.json files, and the 33 KB bitnet_classifier.py Python file. No proprietary toolchain, no vendor lock-in.

The cascade architecture (A gate + B specialist) is the result of 421 autonomous improvement iterations that progressively closed every miss on the live drug-interaction cohort while preserving cross-architecture bit-identity. The full audit log is in the ClinicalMem repo.

Files in this repo

File	Size	Purpose
`bitnet_weights.json`	121 KB	A bundle (gate): 193 → 256 → 5, ternary weights + Q16.16 biases. `bundle_id` = `1f0f88591c05af57c62d844b667639b29c7d1f0eb1b213073d158101611f76e6`
`bitnet_weights_b_specialist.json`	30 KB	B bundle (tier-2 specialist): 193 → 64 → 5. `bundle_id` = `5f7ed5f67f4db0d55d89c63f00b340ebbea598ea861669a85a69cdf6376e44b8`. Trained on non-contra subset (95 samples).
`bitnet_weights.v1.cfadb4f6.bak.json`	20 KB	v1 historical baseline (audit-trail preservation): 128 → 64 → 5, hash-only encoder. `bundle_id` starts with `cfadb4f6`. Kept for FDA SaMD audit-chain reconstruction.
`bitnet_classifier.py`	34 KB	Pure-Python Q16.16 forward pass. Loads either bundle; same code path for A, B, and v1.
`bitnet_features_v8.py`	9.4 KB	193-dim feature encoder (64 hash trits + 26 ATC pharmacology flag bits per drug + 13 pair-derived DDI rule bits).

Total: 210 KB of weights + ~43 KB of code = **253 KB** for the entire FDA-SaMD-grade clinical safety classifier.

Architecture

ClinicalMem BitNet b1.58 — A+B cascade architecture

Architecture from Ma, Wang, Ma, et al. (arXiv:2402.17764). Clean-room Python implementation with pure-integer Q16.16 fixed-point arithmetic — no torch runtime dep, no GPU required. Training used PyTorch + Straight-Through Estimator on H200 SXM (RunPod). Diagram source: architecture.mmd.

Quick start

import json
import importlib.util
from pathlib import Path

# 1. Load the classifier code (single 33 KB file, no extra deps)
spec = importlib.util.spec_from_file_location("bitnet_classifier", "bitnet_classifier.py")
clf_mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(clf_mod)

# 2. Load A + B bundles
a_weights = json.load(open("bitnet_weights.json"))
b_weights = json.load(open("bitnet_weights_b_specialist.json"))

# 3. Classify
result_a = clf_mod.classify("warfarin", "ibuprofen", a_weights)
print(result_a.severity_name, result_a.repro_hash[:16])
# → "serious" "97db2b0e87734b96..." (bit-identical on CPU/GPU/NPU/browser)

For the full cascade dispatcher (A → B), see bitnet_classifier.py::classify_ensemble and the ClinicalMem engine/clinical_scoring.py integration.

Bit-identical cross-architecture

The Q16.16 fixed-point forward pass produces byte-for-byte identical output on every device tested:

Device	Inference	`repro_hash` (warfarin + ibuprofen)
RTX 3080 (CUDA)	~0.4 ms	`97db2b0e87734b96…`
Apple M1 Max	~0.5 ms	`97db2b0e87734b96…`
Intel i7-5930K	~0.6 ms	`97db2b0e87734b96…`
Raspberry Pi 5	~0.9 ms	`97db2b0e87734b96…`
Raspberry Pi Zero 2 W	~6 ms	`97db2b0e87734b96…`
Browser (vanilla JS, BigInt)	~8 ms	`97db2b0e87734b96…`

Why this matters for the FDA: a clinical AI decision logged in 2026 can be re-classified in 2046 on whatever hardware exists then, and an auditor can verify the original repro_hash matches. No floating-point drift, no vendor lock-in, no proprietary inference runtime.

Edge deployment

The combined cascade (~150 KB) runs on a $5 Raspberry Pi Zero 2 W with the SD card pulled out (literally — see ClinicalMem docs/edge_pi_offline.md for the offline demo). Latency:

Tier	Pi 5	Pi 4	Pi Zero 2 W	ESP32
A bundle alone	0.4 ms	0.7 ms	1.8 ms	~12 ms
A + B cascade	0.9 ms	1.6 ms	6 ms	~30 ms

The "ClinicalMem Box" hardware product profile (USB OTG drop-in, office-router drop-in, EHR sidecar) is documented at ~$99 SKU / ~$60 COGS. ClinicalMem ships with the data-licensing reality check (RxNorm public + FDA SPL public + DrugBank commercial) inline.

Evaluation

Live PCCP regression cohort (139 drug pairs, 4 severity classes)

Severity	Cohort size	A alone	A + B cascade
Contraindicated	44	100% (44/44) — 0 FP	100% (44/44) — 0 FP
Major	4	100% (4/4)	100% (4/4) — 0 FP
Serious	69	84% (58/69)	100% (69/69)
Moderate	22	91% (20/22)	100% (22/22)
Total	139	95% (126/139)	100% (139/139)

The 6-layer ClinicalMem pipeline (deterministic table → OpenEvidence API → RxNorm/NIH RxNav → multi-LLM consensus → BitNet 4.5 anchor → LLM synthesis → abstention gate) achieves the same 100%/100%/100%/100% ensemble outcome with 0 false positives on the contraindicated class (the safety-critical class for clinical deployment).

Anchor cohort

The cohort grew from the canonical FDA / AGS Beers / STOPP-START NTI anchors (warfarin, digoxin, lithium, phenytoin, methotrexate, MAOI×SNRI tranylcypromine + venlafaxine) to the 139-pair live cohort across 421 autonomous-improvement iterations. Every classification ships with a repro_hash; every load-bearing claim is cross-pinned by a test in ClinicalMem tests/.

Reproducibility manifest

The classifier integrates into ClinicalMem's docs/reproducibility_manifest.json — a single-file content-addressed snapshot (SHA-256 of weights + cache + cohort + audit-replay pins + flow plan_hashes) that an FDA SaMD reviewer can verify with one command.

Training

Framework: PyTorch + Straight-Through Estimator (Bengio, Léonard, Courville 2013) for ternary weight quantization
Hardware: H200 SXM (RunPod) for the v8 retrain
Data: 139-pair PCCP cohort built around the FDA / AGS Beers / STOPP-START NTI anchor set, augmented with BOOST_KEYS to stabilize calibration on the contraindicated class. The cache of pre-classified pairs is at ClinicalMem engine/openevidence_cache.json (100% authoritative URLs, average 2.27 URLs/pair).
Augmentation:
- A bundle: cache_contraindicated_anchors_x_200 + major_class_x_100 + tacrolimus+voriconazole_x_200 + azathioprine+febuxostat_x_200 (anti-FN) + 9 BOOST_KEYS @200x
- B bundle: MAJOR_KEYS @50x (4) + NTI_OVERVETO_KEYS @30x (6) + SERIOUS_TRUE_MISS_KEYS @30x (5) + MODERATE_MISS_KEYS @30x (2), trained EXCLUSIVELY on the 95 non-contra samples
Training iter:
- A: iter-242-path-a-v8-h256 (broke v7 architectural ceiling at h=128 by doubling hidden to 256)
- B: iter-421-path-b-bitnet-b-specialist

After training, weights are exported to JSON (ternary cast to int8, biases as Q16.16 int32), and the resulting bundle_id is the SHA-256 over the canonical-form weight payload (see _meta.bundle_id self-reference in each weights file).

Honest limitations

Drug coverage: The v1 training corpus covers ~3,247 pairs across 224 drugs — approximately 0.2% of the full DrugBank interaction database. Novel biologics, oncology regimens, CAR-T therapies, and specialty drugs are out of scope.
No live EHR pilot: All clinical validation used synthetic FHIR R4 patient data (Sarah Mitchell + 30-patient synthetic cohort with CMS Luhn-valid NPIs). HIPAA does not apply to the current model.
No FDA SaMD filing yet: The Q16.16 reproducibility primitive, PCCP regression gate, and 21 CFR Part 11 audit-export module together form a credible FDA-SaMD-ready surface, but no submission has been made. See ClinicalMem docs/fda_q_sub_draft.md.
Hardware attestation: Current tamper detection is a software SHA-256 check. A hardware-rooted trust mechanism (TPM, secure enclave) would be required for production deployment.
Single-model classifier: This is a layer in ClinicalMem's 6-layer defense-in-depth pipeline (deterministic table → OpenEvidence API → RxNorm + NIH RxNav → multi-LLM consensus → BitNet 4.5 anchor → LLM synthesis → abstention gate). Layers 1–4 remain load-bearing for novel drugs and cohort drift; the BitNet ensemble is the bit-identical replay primitive, not the sole classifier.

License

Apache-2.0 — same as ClinicalMem and mind-mem, with explicit § 3 patent grant for hospital procurement and regulatory teams.

Copyright 2026 STARGA Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Citation

If you use this model in research or production, please cite the underlying BitNet b1.58 paper plus the ClinicalMem deployment:

@misc{ma2024bitnet,
  title={The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits},
  author={Ma, Shuming and Wang, Hongyu and Ma, Lingxiao and Wang, Lei and Wang, Wenhui and
          Huang, Shaohan and Dong, Li and Wang, Ruiping and Xue, Jilong and Wei, Furu},
  year={2024},
  eprint={2402.17764},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{starga2026clinicalmem,
  title={ClinicalMem: Bit-identical Clinical Decisions for Healthcare AI},
  author={STARGA Inc.},
  year={2026},
  url={https://github.com/star-ga/clinicalmem}
}

Cross-references

GitHub repo: star-ga/clinicalmem
Live demo: clinicalmem-demo.pages.dev/demo
Devpost: devpost.com/software/clinimalmem
Underlying memory layer: mind-mem on PyPI (v4.0.1+) — the open-source clinical AI memory infrastructure
STARGA: star.ga — patent-pending Mind Cognitive Kernel™ technology

Built for the Agents Assemble Healthcare AI Hackathon by STARGA Inc. — May 2026.

Downloads last month: -; Downloads are not tracked for this model. How to track

Papers for stargainc/clinicalmem-bitnet-b158

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 629

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Paper • 1308.3432 • Published Aug 15, 2013