Instructions to use sriksven/MedSage-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sriksven/MedSage-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sriksven/MedSage-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sriksven/MedSage-7B")
model = AutoModelForCausalLM.from_pretrained("sriksven/MedSage-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sriksven/MedSage-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sriksven/MedSage-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/MedSage-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sriksven/MedSage-7B

SGLang

How to use sriksven/MedSage-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sriksven/MedSage-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/MedSage-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sriksven/MedSage-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/MedSage-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use sriksven/MedSage-7B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sriksven/MedSage-7B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sriksven/MedSage-7B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sriksven/MedSage-7B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="sriksven/MedSage-7B",
    max_seq_length=2048,
)

Docker Model Runner
How to use sriksven/MedSage-7B with Docker Model Runner:
```
docker model run hf.co/sriksven/MedSage-7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

MedSage-7B

A fine-tuned Qwen2.5-7B-Instruct model specialized for medical question answering and clinical knowledge. Trained on a combination of medical flashcards, clinical wiki articles, and medical Q&A pairs to provide detailed, accurate medical information.

Disclaimer: This model is for educational and research purposes only. It is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider for medical decisions.

Key Details


Base model	Qwen/Qwen2.5-7B-Instruct
Method	QLoRA (4-bit NF4, rank 16, alpha 16)
Library	Unsloth + TRL SFTTrainer
Datasets	medalpaca flashcards (5K) + wikidoc (5K) + MedQuad (3K)
Total examples	13,000
Hardware	NVIDIA RTX A5000 (24GB VRAM) on RunPod
Training time	~2.75 hours (500 steps)
Final loss	1.006
Parameters trained	40.4M of 7.66B (0.53%)
Format	ChatML
Output	Merged 16-bit safetensors

Dataset Composition

Three complementary medical data sources:

Medical Flashcards (5,000 examples) — concise Q&A pairs covering anatomy, pharmacology, pathology, physiology, and clinical medicine. Teaches the model to give focused, factual answers.
WikiDoc Medical Articles (5,000 examples) — longer-form medical knowledge from WikiDoc covering diseases, conditions, treatments, and diagnostic criteria. Gives the model depth on clinical topics.
MedQuad (3,000 examples) — consumer health questions and expert answers from NIH sources covering drugs, diseases, procedures, and general health topics. Teaches the model to answer patient-facing medical questions.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sriksven/MedSage-7B")
tokenizer = AutoTokenizer.from_pretrained("sriksven/MedSage-7B")

messages = [
    {
        "role": "system",
        "content": "You are a medical knowledge assistant. Provide accurate, detailed medical information based on established medical science. Always note that users should consult healthcare professionals for personal medical decisions.",
    },
    {
        "role": "user",
        "content": "What are the common symptoms and first-line treatments for Type 2 diabetes?",
    },
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Unsloth (faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="sriksven/MedSage-7B",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Topic Coverage

Diseases & Conditions — symptoms, pathophysiology, diagnostic criteria, staging
Pharmacology — drug mechanisms, indications, contraindications, side effects
Anatomy & Physiology — organ systems, cellular biology, biochemistry
Clinical Medicine — differential diagnosis, treatment protocols, patient management
Public Health — epidemiology, screening, prevention, vaccination
Consumer Health — plain-language explanations of medical topics

Intended Use

Medical education and study aids
Clinical knowledge reference systems
Healthcare chatbot prototyping
Research on domain-specific LLM fine-tuning in biomedicine
Medical NLP research and benchmarking

Limitations

NOT for clinical decision-making — this model should never be used to make real medical decisions
Higher final loss (1.006) compared to other models in this suite, reflecting the complexity and diversity of medical language
May hallucinate medical facts, drug names, or dosages
Trained on English medical text only
Knowledge is limited to training data patterns and does not reflect the latest medical research
Does not have access to patient records, lab results, or imaging
Not evaluated against established medical NLP benchmarks (MedQA, PubMedQA, etc.)

Training Metrics

Step	Loss	Epoch
10	2.342	0.12
100	1.069	1.24
250	0.972	3.09
400	0.898	4.94
500	0.867	6.17

Training Infrastructure


GPU	NVIDIA RTX A5000 24GB
Cloud	RunPod ($0.27/hr)
Framework	Unsloth 2026.5.2 + TRL + Transformers 5.5.0
Precision	BF16 training, 4-bit NF4 base quantization
Optimizer	AdamW 8-bit
Learning rate	2e-4, linear decay
Batch size	16 effective (4 per device × 4 accumulation)
Packing	Enabled

Source Code

Training scripts: github.com/sriksven/LLM-FineTune-Suite

License

Apache 2.0

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for sriksven/MedSage-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct