star-ga commited on
Commit
4bbaa3f
Β·
verified Β·
1 Parent(s): d76dce2

docs(model card): replace ASCII architecture with Mermaid flowchart (HF renders natively)

Browse files
Files changed (1) hide show
  1. README.md +28 -54
README.md CHANGED
@@ -77,62 +77,36 @@ The cascade architecture (A gate + B specialist) is the result of **421 autonomo
77
 
78
  ## Architecture
79
 
80
- ```
81
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
82
- β”‚ INPUT: (drug_a, drug_b) β”‚
83
- β”‚ e.g. ("warfarin", "ibuprofen") β”‚
84
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
85
- β”‚
86
- β–Ό
87
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
88
- β”‚ encode_pair() β†’ 193-dim ternary feature vector β”‚
89
- β”‚ β€’ 64 BLAKE2b-128 hash trits per drug (Γ—2 = 128 hash bits) β”‚
90
- β”‚ β€’ 26 ATC pharmacology flag bits per drug (Γ—2 = 52 flag bits) β”‚
91
- β”‚ β€’ 13 pair-derived DDI rule bits (CYP3A4 inhibΓ—substrate, β”‚
92
- β”‚ OATP1B1Γ—statin, P-gp inhibΓ—substrate, CYP2C9Γ—anticoag, β”‚
93
- β”‚ MAOIΓ—serotonergic, PDE5Γ—nitrate, contrastΓ—metformin, β”‚
94
- β”‚ CYP1A2 inhibΓ—substrate, XOΓ—thiopurine, folate-antagonist, β”‚
95
- β”‚ tetracyclineΓ—retinoid, ACEΓ—neprilysin, metforminΓ—renal-state) β”‚
96
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
97
- β”‚
98
- β–Ό
99
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
100
- β”‚ A BUNDLE (gate, 256h) β”‚ β”‚ B BUNDLE (specialist, 64h) β”‚
101
- β”‚ 193 β†’ 256 β†’ 5 β”‚ β”‚ 193 β†’ 64 β†’ 5 β”‚
102
- β”‚ ternary {-1, 0, +1} β”‚ β”‚ ternary {-1, 0, +1} β”‚
103
- β”‚ Q16.16 biases β”‚ β”‚ Q16.16 biases β”‚
104
- β”‚ bundle_id: 1f0f8859… β”‚ β”‚ bundle_id: 5f7ed5f6… β”‚
105
- β”‚ ~50,949 params Β· 118 KB β”‚ β”‚ ~12,300 params Β· 30 KB β”‚
106
- β”‚ β”‚ β”‚ trained on non-contra (95) β”‚
107
- β”‚ 100% recall: contra (44/44) β”‚ 100% recall: β”‚
108
- β”‚ major (4/4) β”‚ serious (69/69) β”‚
109
- β”‚ 0 contra FP β”‚ moderate (22/22) β”‚
110
- β”‚ 0 major FP β”‚ major (4/4 within non-contra)β”‚
111
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
112
- β”‚
113
- β–Ό
114
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
115
- β”‚ CASCADE DISPATCHER β”‚
116
- β”‚ if A predicts "contraindicated" β†’ return "contraindicated" β”‚
117
- β”‚ else β†’ return B's constrained argmax over β”‚
118
- β”‚ {moderate, serious, major} β”‚
119
- β”‚ composite weights_id = "{a_id}+{b_id}" (129 chars) β”‚
120
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
121
- β”‚
122
- β–Ό
123
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
124
- β”‚ OUTPUT: BitNetResult( β”‚
125
- β”‚ severity_name ∈ {none, moderate, serious, major, contraindicated},β”‚
126
- β”‚ logits_q16 : 5Γ—Q16.16 fixed-point logits, β”‚
127
- β”‚ feature_hash : SHA-256 over canonical 193-dim feature vector, β”‚
128
- β”‚ repro_hash : SHA-256 over (feature_hash, logits_q16, severity, β”‚
129
- β”‚ weights_id) β€” the audit primitive, β”‚
130
- β”‚ weights_id : composite "{a_id}+{b_id}", β”‚
131
- β”‚ ) β”‚
132
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€οΏ½οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
133
  ```
134
 
135
- Source: BitNet b1.58 architecture from Ma, Wang, Ma, et al. ([arXiv:2402.17764](https://arxiv.org/abs/2402.17764)). This is a clean-room Python implementation with **pure-integer Q16.16 fixed-point arithmetic** β€” no `torch` runtime dep, no GPU required. Training used PyTorch + Straight-Through Estimator on H200 SXM (RunPod).
136
 
137
  ---
138
 
 
77
 
78
  ## Architecture
79
 
80
+ ```mermaid
81
+ flowchart TB
82
+ INPUT["**Input**<br/>(drug_a, drug_b)<br/>e.g. (warfarin, ibuprofen)"]:::input
83
+
84
+ ENCODE["**encode_pair()** β†’ 193-dim ternary feature vector<br/>β€’ 64 BLAKE2b-128 hash trits per drug (Γ—2 = 128 bits)<br/>β€’ 26 ATC pharmacology flag bits per drug (Γ—2 = 52 bits)<br/>β€’ 13 pair-derived DDI rule bits<br/>(CYP3A4 inhibΓ—substrate, OATP1B1Γ—statin, P-gpΓ—substrate,<br/>CYP2C9Γ—anticoag, MAOIΓ—serotonergic, PDE5Γ—nitrate,<br/>contrastΓ—metformin, CYP1A2Γ—substrate, XOΓ—thiopurine,<br/>folate-antagonist, tetracyclineΓ—retinoid, ACEΓ—neprilysin,<br/>metforminΓ—renal-state)"]:::encoder
85
+
86
+ A["πŸ”΄ **A Bundle** &nbsp;Β·&nbsp; gate &nbsp;Β·&nbsp; 256-hidden<br/>193 β†’ 256 β†’ 5 &nbsp;Β·&nbsp; ternary {-1, 0, +1} &nbsp;Β·&nbsp; Q16.16 biases<br/>bundle_id: <code>1f0f8859…</code> &nbsp;Β·&nbsp; 50,949 params &nbsp;Β·&nbsp; 118 KB<br/><br/>**100% recall**: contraindicated (44/44) &nbsp;Β·&nbsp; major (4/4)<br/>**0 false positives** on contra and major"]:::gate
87
+
88
+ B["πŸ”΅ **B Bundle** &nbsp;Β·&nbsp; tier-2 specialist &nbsp;Β·&nbsp; 64-hidden<br/>193 β†’ 64 β†’ 5 &nbsp;Β·&nbsp; ternary {-1, 0, +1} &nbsp;Β·&nbsp; Q16.16 biases<br/>bundle_id: <code>5f7ed5f6…</code> &nbsp;Β·&nbsp; ~12,300 params &nbsp;Β·&nbsp; 30 KB<br/>trained on non-contra subset (95 samples)<br/><br/>**100% recall**: serious (69/69) &nbsp;Β·&nbsp; moderate (22/22)<br/>major (4/4 within non-contra)"]:::specialist
89
+
90
+ DISPATCH["βš–οΈ **Cascade Dispatcher**<br/>if A predicts <strong>contraindicated</strong> β†’ return contraindicated<br/>else β†’ return B's constrained argmax over<br/>{moderate, serious, major}<br/><br/>composite weights_id = <code>{a_id}+{b_id}</code> (129 chars)"]:::dispatch
91
+
92
+ OUT["βœ… **BitNetResult**<br/>severity_name ∈ {none, moderate, serious, major, contraindicated}<br/>logits_q16 : 5Γ—Q16.16 fixed-point logits<br/>feature_hash : SHA-256 over canonical 193-dim feature vector<br/>repro_hash : SHA-256 over (feature_hash, logits_q16, severity, weights_id)<br/>weights_id : composite <code>{a_id}+{b_id}</code><br/><br/>↓ <strong>bit-identical replay primitive β€” verifiable decades later, on any chip</strong> ↓"]:::output
93
+
94
+ INPUT --> ENCODE
95
+ ENCODE --> A
96
+ ENCODE --> B
97
+ A --> DISPATCH
98
+ B --> DISPATCH
99
+ DISPATCH --> OUT
100
+
101
+ classDef input fill:#EFF6FF,stroke:#2563eb,color:#1e3a8a,stroke-width:2px
102
+ classDef encoder fill:#F0FDFA,stroke:#0F766E,color:#134E4A,stroke-width:2px
103
+ classDef gate fill:#FEF2F2,stroke:#dc2626,color:#7f1d1d,stroke-width:2px
104
+ classDef specialist fill:#EFF6FF,stroke:#2563eb,color:#1e3a8a,stroke-width:2px
105
+ classDef dispatch fill:#FEF3C7,stroke:#d97706,color:#7c2d12,stroke-width:2px
106
+ classDef output fill:#F0FDF4,stroke:#16a34a,color:#14532d,stroke-width:2px
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ```
108
 
109
+ > **Source**: BitNet b1.58 architecture from Ma, Wang, Ma, et al. ([arXiv:2402.17764](https://arxiv.org/abs/2402.17764)). This is a clean-room Python implementation with **pure-integer Q16.16 fixed-point arithmetic** β€” no `torch` runtime dep, no GPU required. Training used PyTorch + Straight-Through Estimator on H200 SXM (RunPod).
110
 
111
  ---
112