Instructions to use iti-visual-analytics/GRamma-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use iti-visual-analytics/GRamma-12B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="iti-visual-analytics/GRamma-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("iti-visual-analytics/GRamma-12B") model = AutoModelForCausalLM.from_pretrained("iti-visual-analytics/GRamma-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use iti-visual-analytics/GRamma-12B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "iti-visual-analytics/GRamma-12B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iti-visual-analytics/GRamma-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/iti-visual-analytics/GRamma-12B
- SGLang
How to use iti-visual-analytics/GRamma-12B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "iti-visual-analytics/GRamma-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iti-visual-analytics/GRamma-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "iti-visual-analytics/GRamma-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iti-visual-analytics/GRamma-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use iti-visual-analytics/GRamma-12B with Docker Model Runner:
docker model run hf.co/iti-visual-analytics/GRamma-12B
GRamma-12B
GRamma-12B is a 12-billion-parameter instruction-tuned language model specialized for the Greek medical domain. It is built on top of Gemma 3 12B Instruct and adapted through parameter-efficient fine-tuning on a collection of Greek and bilingual medical question-answering data.
The goal of GRamma-12B is to combine the strong multilingual reasoning capabilities of the Gemma 3 base model with clinical knowledge and Greek-language medical understanding, extending general-purpose Greek language modeling toward the specialized requirements of the clinical domain.
Model Information
- Base model: Gemma 3 12B Instruct (
google/gemma-3-12b-it) - Parameters: 12B
- Languages: Greek (primary), English
- Domain: Medical / Clinical
- Context length: Inherited from Gemma 3 12B
- License: Gemma Terms of Use
Training Data
GRamma-12B was trained on a mixture of synthetically generated Greek medical MCQA data and curated open-source datasets translated into Greek, organized into three broad categories: medical MCQA, free-form medical QA, and general instruction-following data.
Synthetic Medical MCQA
A synthetic data generation pipeline produced multiple-choice question–answer (MCQA) pairs in the medical domain, conditioned on a hierarchical medical taxonomy comprising:
- 51 subjects covering the primary medical domains,
- 301 categories (each uniquely assigned to a subject),
- 1,458 subcategories (each uniquely associated with a category),
forming a strict subject → category → subcategory hierarchy used to control topical coverage and reduce redundancy.
Two complementary generation methodologies were used:
- GPT-4.1-based pipeline: a two-stage process separating question-stem generation from answer/explanation generation, optimized for token efficiency and reduced duplication.
- openai/gpt-oss-20b pipeline (on-premises): a per-instance generation strategy with explicit question-type conditioning (Single Best Answer, SBA with All/None, True/False, select-combination, term-properties completion, matching-matrix, negative/exception, calculation, and best-next-step), difficulty conditioning (5 levels), deterministic seed-based diversity control, and a multi-stage validation layer (schema/field integrity, option-structure checks, correct-answer consistency, and distractor-rationale validation).
Open-Source Datasets
To complement the synthetic data, several widely used public datasets were incorporated and translated into Greek through a multi-stage iterative translation pipeline (initial translation, plus two corrective refinement phases targeting untranslated terminology and medical abbreviations):
Medical MCQA
- MedMCQA - Medical entrance exams from AIIMS & NEET-PG.
- MedQA-USMLE - Clinical-vignette questions from the USMLE.
Medical QA
- MedicationQA - Open-ended pharmacology / medication questions.
- MedQuAD - Biomedical QA pairs from trusted NIH public health sources.
General Instruction-Following / QA
- OASST1 - Human-annotated multi-turn instruction-following data.
- OpenOrca - Instruction-tuning data for reasoning and structured responses.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "iti-visual-analytics/GRamma-12B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Ποια είναι η πρώτη γραμμή θεραπείας για την αρτηριακή υπέρταση;"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Evaluation
Evaluation was performed using a fork of Lighteval extended to support Greek downstream tasks and target-language prompts. Benchmarks are reported with the number of few-shot examples used in the evaluation prompt.
The evaluation process utilized a comprehensive suite of datasets to assess performance across both general and medical domains. Greek-language benchmarking incorporated machine-translated datasets, including Medical MCQA Greek, MMLU Greek, HellaSwag Greek, TruthfulQA Greek, and ARC Greek, alongside the multilingual Belebele dataset. For English-language capabilities, benchmarks such as Winogrande, HellaSwag, ARC-Challenge, TruthfulQA, and MMLU were employed.
Greek Language Tasks
| Evaluation Dataset | GRamma-12B | Gemma-3-Base | Meltemi-7B-v1.5 | Krikri-8B-Base |
|---|---|---|---|---|
| Medical MCQA EL (15-shot) | 66.92% | 60.30% | 42.20% | 53.80% |
| Belebele EL (5-shot) | 90.33% | 90.22% | 61.00% | 82.70% |
| TruthfulQA MC1 EL (0-shot) | 36.47% | 39.78% | N/A | N/A |
| TruthfulQA MC2 EL (0-shot) | 56.36% | 59.19% | 49.00% | 54.20% |
| ARC-Challenge EL (25-shot) | 55.05% | 57.79% | 40.00% | 49.40% |
| HellaSwag EL (10-shot) | 64.77% | 65.40% | 53.80% | 64.60% |
| MMLU EL (5-shot) | 64.19% | 59.88% | 41.20% | 52.00% |
| Average | 62.01% | 61.80% | 47.87%* | 59.45%* |
* Averages for Meltemi and Krikri are computed over the 6 available tasks (excluding TruthfulQA MC1).
English Language Tasks
| Evaluation Dataset | GRamma-12B | Gemma-3-Base | Meltemi-7B-v1.5 | Krikri-8B-Base |
|---|---|---|---|---|
| Belebele (5-shot) | 93.22% | 93.11% | 77.70% | 79.80% |
| Winogrande (5-shot) | 74.27% | 74.74% | 73.40% | 72.60% |
| HellaSwag (10-shot) | 82.77% | 83.67% | 79.60% | 80.70% |
| ARC-Challenge (25-shot) | 71.42% | 71.33% | 54.10% | 57.80% |
| TruthfulQA MC1 (0-shot) | 37.33% | 40.27% | N/A | N/A |
| TruthfulQA MC2 (0-shot) | 55.11% | 57.95% | 40.50% | 44.80% |
| MMLU (5-shot) | 74.29% | 73.26% | 56.90% | 65.10% |
| Average | 69.77% | 70.62% | 63.70%* | 66.80%* |
* Averages for Meltemi and Krikri are computed over the 6 available tasks (excluding TruthfulQA MC1).
GRamma-12B achieves a +6.6% improvement on Medical MCQA EL and +4.3% on MMLU EL over the Gemma 3 baseline, raising the Greek average slightly (62.01 vs. 61.80) while keeping the English average within less than a point of the baseline — indicating successful medical/Greek specialization with minimal loss of general capability.
Intended Use & Limitations
GRamma-12B is intended for research and educational purposes in Greek medical question answering and clinical reasoning. It is not approved for clinical use and must not be relied upon for actual diagnosis, treatment decisions, or any form of patient care, nor should it be used as a substitute for advice from qualified healthcare professionals. As with any language model, its outputs may contain factual errors or hallucinations and should always be verified by a domain expert before use. The model inherits the biases and limitations of its base model (Gemma 3 12B) and of the synthetic and translated training data.
Acknowledgements
Developed at the Centre for Research and Technology Hellas (CERTH), Information Technologies Institute (ITI), Thessaloniki, Greece. Built on top of Google's Gemma 3. Evaluation was carried out using a Greek-extended fork of Lighteval.
License
Model:
GRamma-12B is a derivative model based on Google's Gemma 3 12B Instruct. The model weights, usage, and distribution are subject to the official Gemma Terms of Use.
Users must comply with all applicable restrictions and requirements defined by the Gemma license agreement.
Training Data and Derived Resources:
This model was developed using a training pipeline that includes synthetic data generated with OpenAI models. The generation of such synthetic data is subject to the applicable OpenAI Terms of Use, as well as the licenses and terms of the original datasets used in the pipeline, including MedQA, MedMCQA, OASST1, and OpenOrca.
- Downloads last month
- 28