Instructions to use cmcmaster/rheum-gemma-2-2b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cmcmaster/rheum-gemma-2-2b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cmcmaster/rheum-gemma-2-2b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cmcmaster/rheum-gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained("cmcmaster/rheum-gemma-2-2b-it", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use cmcmaster/rheum-gemma-2-2b-it with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cmcmaster/rheum-gemma-2-2b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cmcmaster/rheum-gemma-2-2b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cmcmaster/rheum-gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cmcmaster/rheum-gemma-2-2b-it

SGLang

How to use cmcmaster/rheum-gemma-2-2b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cmcmaster/rheum-gemma-2-2b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cmcmaster/rheum-gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cmcmaster/rheum-gemma-2-2b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cmcmaster/rheum-gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use cmcmaster/rheum-gemma-2-2b-it with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for cmcmaster/rheum-gemma-2-2b-it to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for cmcmaster/rheum-gemma-2-2b-it to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for cmcmaster/rheum-gemma-2-2b-it to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="cmcmaster/rheum-gemma-2-2b-it",
    max_seq_length=2048,
)

Docker Model Runner
How to use cmcmaster/rheum-gemma-2-2b-it with Docker Model Runner:
```
docker model run hf.co/cmcmaster/rheum-gemma-2-2b-it
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for cmcmaster/rheum-gemma-2-2b-it

Model Details

Model Description

This model is a fine-tuned version of the Gemma 2 2B model, specifically adapted for rheumatology-related tasks. It combines the base knowledge of the Gemma model with specialized rheumatology information.

Developed by: cmcmaster
Model type: Language Model
Language(s) (NLP): English (primarily)
License: [More Information Needed]
Finetuned from model: unsloth/gemma-2-2b-bnb-4bit, merged with unsloth/gemma-2-2b-it

Model Sources

Repository: https://huggingface.co/cmcmaster/rheum-gemma-2-2b-it

Uses

Direct Use

This model can be used for rheumatology-related natural language processing tasks, such as question answering, information retrieval, or text generation in the domain of rheumatology.

Out-of-Scope Use

This model should not be used as a substitute for professional medical advice, diagnosis, or treatment. It is not intended to be used for making clinical decisions without the involvement of qualified healthcare professionals.

Training Details

Training Data

The model was trained on the cmcmaster/rheum_texts dataset.

Training Procedure

The model was fine-tuned using the unsloth library, which allows for efficient finetuning of large language models. Here are the key details of the training procedure:

Base Model: unsloth/gemma-2-2b-bnb-4bit
Max Sequence Length: 2048
Quantization: 4-bit quantization
LoRA Configuration:
- r = 128
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
- lora_alpha = 32
- lora_dropout = 0
- use_rslora = True (Rank Stabilized LoRA)

Training Hyperparameters

Batch Size: 4 per device
Gradient Accumulation Steps: 8
Learning Rate: 2e-4
Warmup Ratio: 0.03
Number of Epochs: 1
Optimizer: AdamW (8-bit)
Weight Decay: 0.00
LR Scheduler: Cosine
Random Seed: 3407

Post-Training Procedure

After training, the LoRA adapter was merged with the instruction-tuned version of Gemma (unsloth/gemma-2-2b-it) rather than the base model. This approach aims to combine the rheumatology knowledge gained during fine-tuning with the instruction-following capabilities of the tuned model.

Limitations and Biases

While this model has been fine-tuned on rheumatology-related data, it may still contain biases present in the original Gemma model or introduced through the training data. Users should be aware that the model's outputs may not always be accurate or complete, especially for complex medical topics.

Downloads last month: 14

Safetensors

Model size

3B params

Tensor type

F16

Model tree for cmcmaster/rheum-gemma-2-2b-it

Base model

unsloth/gemma-2-2b-it

Finetuned

(138)

this model

Quantizations

1 model