File size: 5,498 Bytes
e2a004e 2049ea1 e2a004e f740f14 e2a004e 0d29f67 c473c6c 0d29f67 56616f9 422c2f4 56616f9 422c2f4 56616f9 422c2f4 56616f9 422c2f4 56616f9 422c2f4 56616f9 422c2f4 56616f9 422c2f4 56616f9 422c2f4 56616f9 422c2f4 e2a004e 56616f9 2491f58 e2a004e 56616f9 e2a004e 56616f9 e2a004e 56616f9 9e363db 56616f9 e2a004e 56616f9 9e363db 56616f9 e2a004e 56616f9 e2a004e 56616f9 e2a004e 56616f9 e2a004e 56616f9 9e363db e2a004e 56616f9 e2a004e 56616f9 e2a004e 56616f9 9e363db e2a004e 792d996 e2a004e 56616f9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: cc-by-nc-4.0
language:
- en
tags:
- medical
- text-generation
- language-model
- biopan
- jepa
library_name: transformers
pipeline_tag: text-generation
---
# SMB-v1-1.7B-Structure
## Documentation & Quickstart
For a comprehensive guide on getting started, architecture details, and advanced usage, please visit our official documentation: [**📖 SMB-v1 Quickstart Guide**](https://docs.standardmodel.bio/get-started/quickstart)
## Model Details
* **Model Name:** SMB-v1-1.7B-Structure
* **Organization:** Standard Model Biomedicine
* **Model Family:** SMB-v1 (Biomedical Foundational Models)
* **LLM Backbone:** Qwen3-1.7B
* **Training Method:** SFT + JEPA Multi-objective
* **License:** Apache 2.0
## Model Description
**SMB-v1-1.7B-Structure** is the initial release of the SMB-v1 family, specifically engineered to model the complex, time-varying dynamics of cancer biology through structured clinical signals. It treats structured clinical data as a multimodal environment, fusing heterogeneous data streams into a unified patient state representation.
Unlike general-purpose models, SMB-v1 is designed to ingest and synthesize diverse structured modalities across the patient journey, including:
* **Temporal Physiological Signals:** Modeling continuous longitudinal trajectories of laboratory values, vital signs, and functional status markers to capture disease progression and physiological drift over time.
* **Clinical Events & Phenotypes:** Encoding discrete, high-cardinality sequences of diagnosis codes (ICD), procedure events (CPT), and adverse events to reconstruct the semantic history of the patient's care.
* **Therapeutic Interventions:** Integrating complex treatment histories—including systemic therapies (chemotherapy, immunotherapy), radiation dosing schedules, and surgical interventions—to understand causal treatment-response dynamics.
* **Molecular & Genomic Profiles:** Embedding high-dimensional static and dynamic biomarker panels (somatic mutations, gene expression signatures, proteomic markers) directly alongside clinical phenotypes.
* **Oncologic Staging & Outcomes:** Processing structured tumor staging (TNM), histology classifications, and survival endpoints to anchor representations in ground-truth biological states.
> **Note:** While the full `SMB-v1` family will introduce unstructured modalities, this **-Structure** variant establishes the foundation using the highest-fidelity structured signals available in modern oncology data warehouses.
## Intended Use Cases
This model is optimized for downstream tasks requiring a deep understanding of longitudinal patient history:
1. **Predictive Risk Stratification:** Forecasting adverse events, toxicity, or rapid progression based on historical trajectories.
2. **Treatment Response Modeling:** Simulating potential patient outcomes under different therapeutic regimens.
3. **Patient Similarity Search:** Identifying cohorts with similar biological and clinical progressions for real-world evidence generation.
4. **Clinical Trial Matching:** Aligning complex patient states with structured eligibility criteria.
## Usage
To use this model effectively, your input data must be in the [**MEDS**](https://medical-event-data-standard.github.io/docs/intro_pages/what_is_MEDS) (Medical Event Data Standard) format and processed using the `smb_biopan_utils` package. This ensures that patient event timelines are correctly serialized into the structured text format the model expects.
### 1\. Installation
Ensure you have the model package and the data utility package installed:
```bash
pip install transformers pandas
pip install git+https://github.com/standardmodelbio/smb-biopan-utils.git
```
### 2\. Inference Example
The following example demonstrates how to load the model, process raw MEDS data using `process_ehr_info`, and generate a patient representation.
```python
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer
from smb_biopan_utils import process_ehr_info
# 1. Load Model and Tokenizer
model_id = "standardmodelbio/SMB-v1-1.7B-Structure"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
device_map="auto"
)
# 2. Load Patient Data (MEDS Format)
# Ensure your dataframe contains columns for 'time', 'code', 'table', etc.
df_meds = pd.read_parquet("path/to/patient_data.parquet")
# 3. Format Data for Inference
# This utility converts the DataFrame into the structured text format
# (e.g., <conditions>...</conditions>) expected by SMB-v1.
input_text = process_ehr_info(
df=df_meds,
subject_id="patient_123", # Specify the subject to process
end_time=pd.Timestamp("2024-01-01") # Prediction timepoint
)
# 4. Generate Representation
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model(
input_ids=inputs.input_ids,
output_hidden_states=True,
return_dict=True
)
# Extract the last hidden state as the patient representation
patient_embedding = outputs.hidden_states[-1]
print(f"Patient Representation Shape: {patient_embedding.shape}")
```
## Citation
If you use this model in your research or application, please cite:
```bibtex
@misc{biopan_omni,
author = {standardmodelbio},
title = {SMB-v1-1.7B-Structure},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/standardmodelbio/SMB-v1-1.7B-Structure}
}
``` |