Instructions to use NurseCitizenDeveloper/NurseGemma-2B-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NurseCitizenDeveloper/NurseGemma-2B-Merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NurseCitizenDeveloper/NurseGemma-2B-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("NurseCitizenDeveloper/NurseGemma-2B-Merged")
model = AutoModelForCausalLM.from_pretrained("NurseCitizenDeveloper/NurseGemma-2B-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use NurseCitizenDeveloper/NurseGemma-2B-Merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NurseCitizenDeveloper/NurseGemma-2B-Merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NurseCitizenDeveloper/NurseGemma-2B-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NurseCitizenDeveloper/NurseGemma-2B-Merged

SGLang

How to use NurseCitizenDeveloper/NurseGemma-2B-Merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NurseCitizenDeveloper/NurseGemma-2B-Merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NurseCitizenDeveloper/NurseGemma-2B-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NurseCitizenDeveloper/NurseGemma-2B-Merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NurseCitizenDeveloper/NurseGemma-2B-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use NurseCitizenDeveloper/NurseGemma-2B-Merged with Docker Model Runner:
```
docker model run hf.co/NurseCitizenDeveloper/NurseGemma-2B-Merged
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🩺 NurseGemma-2B-Merged (Ready to Run)

NurseGemma is a specialised nursing AI assistant fine-tuned on 7,500+ nursing care plans.
This is the merged version (LoRA Adapter + Base Model), making it ready for direct inference and API use.

It is designed to "think like a nurse", following the ADPIE nursing process.

🚀 How to Use

1. In Python (Transformers)

No need for peft or unsloth! It works like any standard model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "NurseCitizenDeveloper/NurseGemma-2B-Merged"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = """<start_of_turn>user
You are a nurse. Create a care plan for a patient with pneumonia.<end_of_turn>
<start_of_turn>model
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

2. Hugging Face Inference API (Free)

You can use this model directly with the free API (Serverless Inference).

API URL: https://router.huggingface.co/models/NurseCitizenDeveloper/NurseGemma-2B-Merged

🎯 Intended Use

Education: Helping nursing students verify their care plans.
Case Generation: Creating clinical vignettes for practice.
Simulation: Roleplaying patient scenarios.

⚠️ Clinical Warning: This model is for educational and research purposes only. It is NOT a clinical decision support tool.

🧠 Training Data

Fine-tuned on NurseReason-7k (PubMedQA vignettes converted to Nursing Care Plans).

Built with ❤️ by nurses, for nurses.

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for NurseCitizenDeveloper/NurseGemma-2B-Merged

Base model

google/gemma-2-2b

Finetuned

google/gemma-2-2b-it

Finetuned

(960)

this model