Update README.md
Browse files
README.md
CHANGED
|
@@ -14,15 +14,42 @@ pipeline_tag: text-generation
|
|
| 14 |
|
| 15 |
# SMB-v1-1.7B-Structure
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
## Model Details
|
| 20 |
|
| 21 |
- **LLM Backbone**: Qwen3-1.7B
|
| 22 |
-
- **Vision Encoder**: None (text-only)
|
| 23 |
- **Connector**: identity
|
| 24 |
- **Model Family**: SMB-v1
|
| 25 |
-
- **Modalities**: EHR Only
|
| 26 |
- **Training Method**: SFT + JEPA Multi-objective
|
| 27 |
|
| 28 |
## Special Tokens
|
|
|
|
| 14 |
|
| 15 |
# SMB-v1-1.7B-Structure
|
| 16 |
|
| 17 |
+
## Model Type
|
| 18 |
+
Multimodal Longitudinal Oncology Foundation Model
|
| 19 |
+
|
| 20 |
+
## Model Description
|
| 21 |
+
The SMB-v1-1.7B-Structure is the initial release of the SMB-v1 family, specifically engineered to model the complex, time-varying dynamics of cancer biology through structured clinical signals. It treats the clinical structured data as a multimodal environment, fusing heterogeneous data streams into a unified patient state representation.
|
| 22 |
+
|
| 23 |
+
It is designed to ingest and synthesize diverse structured modalities across the patient journey, including:
|
| 24 |
+
|
| 25 |
+
- Temporal Physiological Signals: Modeling continuous longitudinal trajectories of laboratory values, vital signs, and functional status markers to capture disease progression and physiological drift over time.
|
| 26 |
+
|
| 27 |
+
- Clinical Events & Phenotypes: Encoding discrete, high-cardinality sequences of diagnosis codes (ICD), procedure events (CPT), and adverse events to reconstruct the semantic history of the patient's care.
|
| 28 |
+
|
| 29 |
+
- Therapeutic Interventions: integrating complex treatment histories, including systemic therapies (chemotherapy, immunotherapy), radiation dosing schedules, and surgical interventions to understand causal treatment-response dynamics.
|
| 30 |
+
|
| 31 |
+
- Molecular & Genomic Profiles: Embedding high-dimensional static and dynamic biomarker panels, including somatic mutations, gene expression signatures, and proteomic markers, directly alongside clinical phenotypes.
|
| 32 |
+
|
| 33 |
+
- Oncologic Staging & Outcomes: Processing structured tumor staging (TNM), histology classifications, and survival endpoints to anchor representations in ground-truth biological states.
|
| 34 |
+
|
| 35 |
+
## Intended Use Cases
|
| 36 |
+
This model is optimized for downstream tasks requiring a deep understanding of longitudinal patient history, such as:
|
| 37 |
+
|
| 38 |
+
- Predictive Risk Stratification: Forecasting adverse events, toxicity, or rapid progression based on historical trajectories.
|
| 39 |
+
|
| 40 |
+
- Treatment Response Modeling: Simulating potential patient outcomes under different therapeutic regimens.
|
| 41 |
+
|
| 42 |
+
- Patient Similarity Search: Identifying cohorts with similar biological and clinical progressions for real-world evidence generation.
|
| 43 |
+
|
| 44 |
+
- Clinical Trial Matching: Aligning complex patient states with structured eligibility criteria.
|
| 45 |
+
|
| 46 |
+
Note: While the full `SMB-v1-1.7B` will introduce unstructured modalities, this -Structure variant establishes the foundation using the highest-fidelity structured signals available in modern oncology data warehouses.
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
|
| 50 |
- **LLM Backbone**: Qwen3-1.7B
|
|
|
|
| 51 |
- **Connector**: identity
|
| 52 |
- **Model Family**: SMB-v1
|
|
|
|
| 53 |
- **Training Method**: SFT + JEPA Multi-objective
|
| 54 |
|
| 55 |
## Special Tokens
|