VitalLM-25M: Biomedical Small Language Model

VitalLM-25M is a custom-architected decoder-only Transformer optimized for biomedical text generation. It was trained from scratch on a 178M token dataset synthesized from clinical dialogues (ChatDoctor) and medical literature (PubMed).

Model Details

Architecture: Custom Transformer with SwiGLU activation and Learned Positional Embeddings.
Parameters: ~25 Million.
Context Length: 256 tokens.
Tokenizer: Byte-Level BPE (Vocab size: 16k).
Training Compute: Single NVIDIA P100 GPU.

Performance Metrics

Final Validation Loss: 3.76
Perplexity: ~43
Training Duration: 22,000 Iterations

How to Use

Because this model uses a custom architecture, you must have the model.py file in your directory (which is included in this repo).

import torch
import json
from tokenizers import ByteLevelBPETokenizer
from model import SLM, SLMConfig  # Requires model.py in the same folder

# 1. Load Configuration
with open("config.json", "r") as f:
    config_dict = json.load(f)
config = SLMConfig(**config_dict)

# 2. Initialize Model Architecture & Load Weights
model = SLM(config)
# Ensure you match the filename you uploaded (vital_lm_25m_swiglu_best.pt)
model.load_state_dict(torch.load("vital_lm_25m_swiglu_best.pt", map_location='cpu'))
model.eval()

# 3. Load Tokenizer (Crucial Step!)
# The tokenizer files (vocab.json, merges.txt) are in this repo
tokenizer = ByteLevelBPETokenizer(
    "updated_vocab.json",
    "updated_merges.txt"
)

# 4. Chat Function
def chat(text, max_new_tokens=50):
    # Encode Input
    ids = tokenizer.encode(text).ids
    idx = torch.tensor(ids).unsqueeze(0)
    
    # Generate
    with torch.no_grad():
        out = model.generate(idx, max_new_tokens=max_new_tokens, temperature=0.3, top_k=40)
    
    # Decode Output
    print(tokenizer.decode(out[0].tolist()))

# Example Test
chat("Patient: I have a severe headache and sensitivity to light. Doctor:")

Downloads last month: 3

aman0419
/

VitalLM-25M

VitalLM-25M: Biomedical Small Language Model

Model Details

Performance Metrics

How to Use

Dataset used to train aman0419/VitalLM-25M