Instructions to use AnmolSharma21/II-Medical-8B-Finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AnmolSharma21/II-Medical-8B-Finetuned with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Intelligent-Internet/II-Medical-8B")
model = PeftModel.from_pretrained(base_model, "AnmolSharma21/II-Medical-8B-Finetuned")

Transformers

How to use AnmolSharma21/II-Medical-8B-Finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AnmolSharma21/II-Medical-8B-Finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AnmolSharma21/II-Medical-8B-Finetuned")
model = AutoModelForCausalLM.from_pretrained("AnmolSharma21/II-Medical-8B-Finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AnmolSharma21/II-Medical-8B-Finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AnmolSharma21/II-Medical-8B-Finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AnmolSharma21/II-Medical-8B-Finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AnmolSharma21/II-Medical-8B-Finetuned

SGLang

How to use AnmolSharma21/II-Medical-8B-Finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AnmolSharma21/II-Medical-8B-Finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AnmolSharma21/II-Medical-8B-Finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AnmolSharma21/II-Medical-8B-Finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AnmolSharma21/II-Medical-8B-Finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AnmolSharma21/II-Medical-8B-Finetuned with Docker Model Runner:
```
docker model run hf.co/AnmolSharma21/II-Medical-8B-Finetuned
```

See axolotl config

axolotl version: 0.16.0.dev0

base_model: Intelligent-Internet/II-Medical-8B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: tokenizer_default

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  # --- Standard Alpaca Datasets (No mapping needed) ---
  - path: ruslanmv/HealthCareMagic-100k
    type: alpaca
  - path: medalpaca/medical_meadow_mediqa
    type: alpaca
  - path: medalpaca/medical_meadow_medical_flashcards
    type: alpaca

  # --- Custom Mapped Hugging Face Datasets ---
  - path: ruslanmv/icliniq-7k
    type:
      system_prompt: "You are a helpful medical assistant."
      field_instruction: input
      field_output: answer_icliniq
      format: "{instruction}"
      no_input_format: "{instruction}"

  - path: keivalya/MedQuad-MedicalQnADataset
    type:
      system_prompt: "You are a helpful medical assistant."
      field_instruction: Question
      field_output: Answer
      format: "{instruction}"
      no_input_format: "{instruction}"

  - path: mohammad2928git/complete_medical_symptom_dataset
    type:
      system_prompt: "You are a helpful medical diagnostic assistant. Based on the patient's symptoms, identify the most likely condition."
      field_instruction: text
      field_output: Name
      format: "{instruction}"
      no_input_format: "{instruction}"

  - path: gamino/wiki_medical_terms
    type: completion
    field: page_text

dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./medical-llm-out

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - down_proj
  - up_proj

# --- NVIDIA B200 Optimizations (Maximum Speed) ---
gradient_accumulation_steps: 1      # No need to accumulate, the GPU can handle it raw
micro_batch_size: 16                # Massively increased to saturate the 180GB VRAM
eval_batch_size: 8                  # Faster evaluations
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 2e-4

train_on_inputs: false
group_by_length: false
bf16: true                          # Blackwell thrives on bfloat16
fp16: false
tf32: true                          

gradient_checkpointing: true
logging_steps: 1
flash_attention: true               # Extremely fast on Blackwell

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0

medical-llm-out

This model is a fine-tuned version of Intelligent-Internet/II-Medical-8B on the ruslanmv/HealthCareMagic-100k, the medalpaca/medical_meadow_mediqa, the medalpaca/medical_meadow_medical_flashcards, the ruslanmv/icliniq-7k, the keivalya/MedQuad-MedicalQnADataset, the mohammad2928git/complete_medical_symptom_dataset and the gamino/wiki_medical_terms datasets. It achieves the following results on the evaluation set:

Loss: 1.4660
Ppl: 4.3319
Memory/max Active (gib): 75.35
Memory/max Allocated (gib): 75.35
Memory/device Reserved (gib): 169.19

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 21090

Training results

Training Loss	Epoch	Step	Validation Loss	Ppl	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	3.0292	20.6798	75.32	75.32	82.41
1.0604	0.2501	1758	1.6494	5.2037	75.35	75.35	138.78
1.6010	0.5001	3516	1.5858	4.8834	75.35	75.35	172.28
1.5152	0.7502	5274	1.5469	4.6968	75.35	75.35	163.01
1.5167	1.0003	7032	1.5192	4.5687	75.35	75.35	170.67
1.3191	1.2504	8790	1.5054	4.5060	75.35	75.35	129.5
1.4320	1.5004	10548	1.4885	4.4306	75.35	75.35	163.71
1.5285	1.7505	12306	1.4749	4.3708	75.35	75.35	138.78
1.5745	2.0006	14064	1.4639	4.3228	75.35	75.35	163.01
1.3795	2.2506	15822	1.4719	4.3577	75.35	75.35	157.6
1.5165	2.5007	17580	1.4682	4.3413	75.35	75.35	108.64
1.0412	2.7508	19338	1.4660	4.3319	75.35	75.35	169.19

Framework versions

PEFT 0.18.1
Transformers 5.3.0
Pytorch 2.9.1+cu128
Datasets 4.5.0
Tokenizers 0.22.2

Downloads last month: 2

Model tree for AnmolSharma21/II-Medical-8B-Finetuned

Base model

Intelligent-Internet/II-Medical-8B

Adapter

(1)

this model

AnmolSharma21
/

II-Medical-8B-Finetuned

medical-llm-out

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for AnmolSharma21/II-Medical-8B-Finetuned

Datasets used to train AnmolSharma21/II-Medical-8B-Finetuned