Instructions to use openmed-community/granite-4.0-micro-OpenMed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openmed-community/granite-4.0-micro-OpenMed with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openmed-community/granite-4.0-micro-OpenMed") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openmed-community/granite-4.0-micro-OpenMed") model = AutoModelForCausalLM.from_pretrained("openmed-community/granite-4.0-micro-OpenMed") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openmed-community/granite-4.0-micro-OpenMed with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openmed-community/granite-4.0-micro-OpenMed" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openmed-community/granite-4.0-micro-OpenMed", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openmed-community/granite-4.0-micro-OpenMed
- SGLang
How to use openmed-community/granite-4.0-micro-OpenMed with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openmed-community/granite-4.0-micro-OpenMed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openmed-community/granite-4.0-micro-OpenMed", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openmed-community/granite-4.0-micro-OpenMed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openmed-community/granite-4.0-micro-OpenMed", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openmed-community/granite-4.0-micro-OpenMed with Docker Model Runner:
docker model run hf.co/openmed-community/granite-4.0-micro-OpenMed
openmed-community/granite-4.0-micro-OpenMed
Granite 4.0 Micro (≈3B) tuned for medical education & instruction following.
Recipe: JEPA-LLM SFT on medmcqa-hard + personas augmentation → GRPO on medmcqa-hard; finalized with Arcee Fusion merge back into the IBM base.
⚠️ Medical safety
This model is not a clinician and may hallucinate. Do not use for diagnosis or treatment. Use under qualified medical supervision only.
TL;DR
- Base:
ibm-granite/granite-4.0-micro— 3B long-context instruct model (Apache-2.0). Includes a structured chat template and tool-calling examples. - Training (high-level):
- JEPA-LLM SFT (400 steps, bs=64) on
mkurman/medmcqa-hardplus instruction-following personas fromallenai/tulu-3-sft-personas-instruction-following. - GRPO (group-relative PPO) on
mkurman/medmcqa-hard, bs 64/128, 8 generations per item (critic-free RL optimizing verifiable correctness). - Model merge: Arcee MergeKit with
merge_method: arcee_fusionto preserve base calibration while keeping domain gains.
- JEPA-LLM SFT (400 steps, bs=64) on
- Infra: Trained/evaluated on AMD Instinct MI300X via Hot AISLE credits — thanks!
What’s inside
1) JEPA-LLM stage (supervised)
- JEPA-LLM objective, see repo: mkurman/jepa-llm, used as an auxiliary signal during SFT to bias toward stable, representation-level learning rather than pure next-token fitting; run for 400 steps on MedMCQA-hard with Personas augmentation from Tulu-3 personas (adds constraint-following behaviors and improves coverage of IFEval-style requirements).
2) GRPO stage (reinforcement learning)
- GRPO replaces the critic with group baselines, enabling efficient multi-sample training; we generate 8 candidates per item and reward answer correctness / format checks.
3) Merge & finalize
- Arcee Fusion in MergeKit to selectively fuse with the original Granite 4.0 Micro (avoids over-averaging from naive merges and tends to keep base calibration).
Intended use & limitations
Intended: medical research, concept review, exam-style Q&A, instruction-following research, and tool-augmented demos.
Out of scope: autonomous clinical decisions, prescription generation, or guideline updates without retrieval/RAG.
Results
| Metric | granite-4.0-micro-OpenMed | granite-4.0-micro |
|---|---|---|
| mmlu | 63.17 | 62.48 |
| leaderboard_mmlu_pro | 33.06 | 32.78 |
| leaderboard_ifeval | granite-4.0-micro-OpenMed | granite-4.0-micro |
|---|---|---|
| inst_level_loose_acc | 85.97 | 85.25 |
| inst_level_strict_acc | 84.05 | 82.97 |
| prompt_level_loose_acc | 79.67 | 78.74 |
| prompt_level_strict_acc | 77.45 | 76.16 |
Author’s harness notes: EleutherAI lm-evaluation-harness with Granite’s chat template and batch size 8.
Quickstart (Transformers)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "openmed-community/granite-4.0-micro-OpenMed"
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{"role": "system", "content": "You are a careful medical assistant. Cite sources and warn this is not medical advice."},
{"role": "user", "content": "Cellulitis vs erysipelas: give 3 bullet differences and 1 caution."}
]
prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))
Tool-calling: Granite’s card includes function-calling examples;
Reproduce key evals (example)
# Classic MMLU (5-shot typical)
lm_eval --model hf \
--model_args pretrained=openmed-community/granite-4.0-micro-OpenMed,parallelize=True \
--tasks mmlu --batch_size 8 --apply-chat-template
# MMLU-Pro (10-choice, harder)
lm_eval --model hf \
--model_args pretrained=openmed-community/granite-4.0-micro-OpenMed,parallelize=True \
--tasks leaderboard_mmlu_pro --batch_size 8 --apply-chat-template
# IFEVAL (verifiable instruction following)
lm_eval --model hf \
--model_args pretrained=openmed-community/granite-4.0-micro-OpenMed,parallelize=True \
--tasks leaderboard_ifeval --batch_size 8 --apply-chat-template
Data & training notes
- MedMCQA-Hard (train split) for domain supervision and RL rewards;.
- Tulu-3 personas for instruction-following with constraint taxonomy inspired by IFEVAL.
- JEPA-LLM: based on the emerging LLM-JEPA objective (representation-space training). See the paper for context and motivation.
- GRPO: efficient for multi-sample training.
- Privacy: no PHI to the best of our knowledge; please report issues.
Commentary on results
Why gains are modest: Granite-4.0-Micro is already a well-calibrated, strongly aligned 3B instruct model with robust instruction-following and tool-use out of the box. In that regime, headroom on popular benchmarks is limited, and naive tuning often degrades base behaviors (calibration, safety, IF). The combination used here—JEPA-LLM (to stabilize representations), personas SFT (to preserve IF constraints), GRPO with verifiable rewards, and Arcee Fusion—appears to nudge the model to measurable improvements without sacrificing base calibration, but the effect sizes remain small, which is consistent with Granite’s strong baseline. In short: we’re operating near the model’s alignment ceiling; targeted gains are possible, sweeping jumps are unlikely without larger capacity or richer supervision.
Acknowledgments
- IBM Granite team for the base model & docs (Apache-2.0).
- AllenAI Tulu-3 for personas datasets.
- Arcee for MergeKit and Arcee Fusion.
- Hot Aisle for MI300X credits :heart:, link: https://hotaisle.xyz/.
Citation
- Downloads last month
- 14
Model tree for openmed-community/granite-4.0-micro-OpenMed
Base model
ibm-granite/granite-4.0-micro