🩺 MediGPT: A Domain-Specific Clinical Small Language Model

MediGPT is a decoder-only GPT-style Transformer trained entirely from scratch on medical and biomedical text corpora.

Unlike models that rely on large-scale pretraining followed by fine-tuning, MediGPT was developed as a research-oriented educational project to explore domain-specific language modeling, transformer architectures, and clinical text generation.


πŸš€ Key Features

  • Built entirely from scratch using PyTorch
  • Decoder-only GPT architecture
  • Trained on medical and biomedical datasets
  • GPT-2 BPE tokenization via tiktoken
  • End-to-end implementation including:
    • Data preprocessing
    • Corpus analysis
    • Transformer implementation
    • Training pipeline
    • Evaluation metrics
    • Text generation

image

image

image


πŸ—οΈ Model Architecture

Component Value
Architecture GPT Decoder
Layers 8
Attention Heads 8
Hidden Size 512
Context Length 512
Dropout 0.1
Tokenizer GPT-2 (tiktoken)
Framework PyTorch

Approximate Parameter Count: ~90M


πŸ“š Training Data

MediGPT was trained on a combination of medical datasets:

MedQuAD

Medical question-answer pairs covering diseases, symptoms, diagnostics, treatments, and healthcare information.

PubMedQA

Biomedical question-answer data derived from scientific literature and PubMed abstracts.

The resulting corpus exposes the model to:

  • Clinical terminology
  • Biomedical vocabulary
  • Medical QA patterns
  • Research-style writing
  • Healthcare discourse

🎯 Training Objective

The model was trained using autoregressive next-token prediction.

Given a sequence such as:

Symptoms of diabetes include ...

the model learns to predict the most likely next token.


πŸ“ˆ Evaluation

Evaluation included:

  • Validation Loss
  • Perplexity
  • Corpus Statistics
  • Attention Analysis
  • Qualitative Text Generation

Results indicate successful learning of:

  • Clinical vocabulary
  • Biomedical terminology
  • Medical writing style
  • Research-oriented sentence structures

image

image

image

image


πŸ”¬ Example Prompt

Input:

Symptoms of diabetes include

Generated continuation:

Symptoms of diabetes include increased thirst, frequent urination, fatigue, blurred vision and other complications associated with impaired glucose regulation.


⚠️ Limitations

  • Small-scale model compared to modern foundation models
  • Limited medical reasoning capability
  • May generate inaccurate information
  • Not suitable for clinical deployment
  • Intended primarily for research and educational purposes

🩺 Medical Disclaimer

This model is not a medical device and must not be used for:

  • Medical diagnosis
  • Treatment recommendations
  • Clinical decision making
  • Emergency healthcare situations

All outputs should be verified by qualified healthcare professionals.


πŸ‘¨β€πŸ’» Author

Rishabh Shenoy

Developed as a research-oriented educational project exploring domain-specific language model development and biomedical NLP.


πŸ“œ Citation

If you use this work, please cite the repository and model page.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support