Instructions to use iti-visual-analytics/GRamma-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use iti-visual-analytics/GRamma-12B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="iti-visual-analytics/GRamma-12B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("iti-visual-analytics/GRamma-12B")
model = AutoModelForCausalLM.from_pretrained("iti-visual-analytics/GRamma-12B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use iti-visual-analytics/GRamma-12B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "iti-visual-analytics/GRamma-12B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iti-visual-analytics/GRamma-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/iti-visual-analytics/GRamma-12B

SGLang

How to use iti-visual-analytics/GRamma-12B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "iti-visual-analytics/GRamma-12B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iti-visual-analytics/GRamma-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "iti-visual-analytics/GRamma-12B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iti-visual-analytics/GRamma-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use iti-visual-analytics/GRamma-12B with Docker Model Runner:
```
docker model run hf.co/iti-visual-analytics/GRamma-12B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

GRamma-12B

GRamma-12B is a 12-billion-parameter instruction-tuned language model specialized for the Greek medical domain. It is built on top of Gemma 3 12B Instruct and adapted through parameter-efficient fine-tuning on a collection of Greek and bilingual medical question-answering data.

The goal of GRamma-12B is to combine the strong multilingual reasoning capabilities of the Gemma 3 base model with clinical knowledge and Greek-language medical understanding, extending general-purpose Greek language modeling toward the specialized requirements of the clinical domain.

Model Information

Base model: Gemma 3 12B Instruct (google/gemma-3-12b-it)
Parameters: 12B
Languages: Greek (primary), English
Domain: Medical / Clinical
Context length: Inherited from Gemma 3 12B
License: Gemma Terms of Use

Training Data

GRamma-12B was trained on a mixture of synthetically generated Greek medical MCQA data and curated open-source datasets translated into Greek, organized into three broad categories: medical MCQA, free-form medical QA, and general instruction-following data.

Synthetic Medical MCQA

A synthetic data generation pipeline produced multiple-choice question–answer (MCQA) pairs in the medical domain, conditioned on a hierarchical medical taxonomy comprising:

51 subjects covering the primary medical domains,
301 categories (each uniquely assigned to a subject),
1,458 subcategories (each uniquely associated with a category),

forming a strict subject → category → subcategory hierarchy used to control topical coverage and reduce redundancy.

Two complementary generation methodologies were used:

GPT-4.1-based pipeline: a two-stage process separating question-stem generation from answer/explanation generation, optimized for token efficiency and reduced duplication.
openai/gpt-oss-20b pipeline (on-premises): a per-instance generation strategy with explicit question-type conditioning (Single Best Answer, SBA with All/None, True/False, select-combination, term-properties completion, matching-matrix, negative/exception, calculation, and best-next-step), difficulty conditioning (5 levels), deterministic seed-based diversity control, and a multi-stage validation layer (schema/field integrity, option-structure checks, correct-answer consistency, and distractor-rationale validation).

Open-Source Datasets

To complement the synthetic data, several widely used public datasets were incorporated and translated into Greek through a multi-stage iterative translation pipeline (initial translation, plus two corrective refinement phases targeting untranslated terminology and medical abbreviations):

Medical MCQA

MedMCQA - Medical entrance exams from AIIMS & NEET-PG.
MedQA-USMLE - Clinical-vignette questions from the USMLE.

Medical QA

MedicationQA - Open-ended pharmacology / medication questions.
MedQuAD - Biomedical QA pairs from trusted NIH public health sources.

General Instruction-Following / QA

OASST1 - Human-annotated multi-turn instruction-following data.
OpenOrca - Instruction-tuning data for reasoning and structured responses.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "iti-visual-analytics/GRamma-12B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Ποια είναι η πρώτη γραμμή θεραπείας για την αρτηριακή υπέρταση;"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,   
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Evaluation

Evaluation was performed using a fork of Lighteval extended to support Greek downstream tasks and target-language prompts. Benchmarks are reported with the number of few-shot examples used in the evaluation prompt.

The evaluation process utilized a comprehensive suite of datasets to assess performance across both general and medical domains. Greek-language benchmarking incorporated machine-translated datasets, including Medical MCQA Greek, MMLU Greek, HellaSwag Greek, TruthfulQA Greek, and ARC Greek, alongside the multilingual Belebele dataset. For English-language capabilities, benchmarks such as Winogrande, HellaSwag, ARC-Challenge, TruthfulQA, and MMLU were employed.

Greek Language Tasks

Evaluation Dataset	GRamma-12B	Gemma-3-Base	Meltemi-7B-v1.5	Krikri-8B-Base
Medical MCQA EL (15-shot)	66.92%	60.30%	42.20%	53.80%
Belebele EL (5-shot)	90.33%	90.22%	61.00%	82.70%
TruthfulQA MC1 EL (0-shot)	36.47%	39.78%	N/A	N/A
TruthfulQA MC2 EL (0-shot)	56.36%	59.19%	49.00%	54.20%
ARC-Challenge EL (25-shot)	55.05%	57.79%	40.00%	49.40%
HellaSwag EL (10-shot)	64.77%	65.40%	53.80%	64.60%
MMLU EL (5-shot)	64.19%	59.88%	41.20%	52.00%
Average	62.01%	61.80%	47.87%*	59.45%*

* Averages for Meltemi and Krikri are computed over the 6 available tasks (excluding TruthfulQA MC1).

English Language Tasks

Evaluation Dataset	GRamma-12B	Gemma-3-Base	Meltemi-7B-v1.5	Krikri-8B-Base
Belebele (5-shot)	93.22%	93.11%	77.70%	79.80%
Winogrande (5-shot)	74.27%	74.74%	73.40%	72.60%
HellaSwag (10-shot)	82.77%	83.67%	79.60%	80.70%
ARC-Challenge (25-shot)	71.42%	71.33%	54.10%	57.80%
TruthfulQA MC1 (0-shot)	37.33%	40.27%	N/A	N/A
TruthfulQA MC2 (0-shot)	55.11%	57.95%	40.50%	44.80%
MMLU (5-shot)	74.29%	73.26%	56.90%	65.10%
Average	69.77%	70.62%	63.70%*	66.80%*

* Averages for Meltemi and Krikri are computed over the 6 available tasks (excluding TruthfulQA MC1).

GRamma-12B achieves a +6.6% improvement on Medical MCQA EL and +4.3% on MMLU EL over the Gemma 3 baseline, raising the Greek average slightly (62.01 vs. 61.80) while keeping the English average within less than a point of the baseline — indicating successful medical/Greek specialization with minimal loss of general capability.

Intended Use & Limitations

GRamma-12B is intended for research and educational purposes in Greek medical question answering and clinical reasoning. It is not approved for clinical use and must not be relied upon for actual diagnosis, treatment decisions, or any form of patient care, nor should it be used as a substitute for advice from qualified healthcare professionals. As with any language model, its outputs may contain factual errors or hallucinations and should always be verified by a domain expert before use. The model inherits the biases and limitations of its base model (Gemma 3 12B) and of the synthetic and translated training data.

Acknowledgements

Developed at the Centre for Research and Technology Hellas (CERTH), Information Technologies Institute (ITI), Thessaloniki, Greece. Built on top of Google's Gemma 3. Evaluation was carried out using a Greek-extended fork of Lighteval.

License

Model:

GRamma-12B is a derivative model based on Google's Gemma 3 12B Instruct. The model weights, usage, and distribution are subject to the official Gemma Terms of Use.

Users must comply with all applicable restrictions and requirements defined by the Gemma license agreement.

Training Data and Derived Resources:

This model was developed using a training pipeline that includes synthetic data generated with OpenAI models. The generation of such synthetic data is subject to the applicable OpenAI Terms of Use, as well as the licenses and terms of the original datasets used in the pipeline, including MedQA, MedMCQA, OASST1, and OpenOrca.

Downloads last month: 28

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for iti-visual-analytics/GRamma-12B

Base model

google/gemma-3-12b-pt

Finetuned

google/gemma-3-12b-it