Saminx22/medical_data_for_slm
Viewer • Updated • 44.4k • 46 • 1
A 381M parameter transformer language model pre-trained on curated medical text from PubMed abstracts, PMC full-text articles, and clinical guidelines.
MedSLM uses a modern GPT-style transformer with several architectural improvements over the standard GPT-2 design:
| Component | Detail |
|---|---|
| Normalization | RMSNorm (faster than LayerNorm, used in LLaMA/Mistral) |
| Positional Encoding | Rotary Positional Embeddings (RoPE) — better length generalization |
| Feed-Forward | SwiGLU activation (gated FFN, outperforms GELU) |
| Attention | Grouped-Query Attention (GQA) — shared KV heads for efficiency |
| Layers | 24 transformer blocks |
| Attention Heads | 16 query heads, 8 KV heads |
| Embedding Dim | 1024 |
| Context Length | 1024 tokens |
| Vocab Size | 50,257 (GPT-2 BPE tokenizer) |
| Parameters | 381,373,440 (~381M) |
Saminx22/medical_data_for_slm (~44M tokens)import torch
import json
from safetensors.torch import load_file
from transformers import AutoTokenizer
# Load config
with open("config.json") as f:
config_dict = json.load(f)
# Reconstruct model (requires the MedSLM class definition)
config = MedSLMConfig(**{k: v for k, v in config_dict.items()
if k in MedSLMConfig.__dataclass_fields__})
model = MedSLM(config)
# Load weights
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict)
model.eval()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("tokenizer/")
prompt = "The patient presented with acute myocardial infarction"
input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
output = model.generate(input_ids, max_new_tokens=200, temperature=0.8, top_k=50, top_p=0.9)
print(tokenizer.decode(output.squeeze().tolist()))
# Load optimizer state
optimizer_state = torch.load("optimizer.pt")
optimizer.load_state_dict(optimizer_state)
| File | Description |
|---|---|
model.safetensors |
Model weights (safetensors format) |
optimizer.pt |
Optimizer state dict for resuming training |
config.json |
Model architecture configuration |
training_config.json |
Training hyperparameters and loss history |
tokenizer/ |
GPT-2 tokenizer files |
loss_curves.png |
Training/validation loss plot |
This model is intended for research purposes in medical NLP. It can be used as:
Apache 2.0