MedGemma-27B-Text-IT-FP8-Dynamic

FP8 Dynamic quantized version of google/medgemma-27b-text-it, quantized by Turn.io using llm-compressor for efficient deployment with vLLM.

Model Details

Attribute	Value
Base model	`google/medgemma-27b-text-it`
Architecture	Decoder-only Transformer (Gemma 3)
Domain	Medical / Biomedical NLP
Modality	Text-only
Quantization	FP8 Dynamic (`FP8_DYNAMIC` scheme)
Quantization tool	`llmcompressor` 0.9.0
Tensor types	BF16, F8_E4M3
Context length	128K tokens
Parameters	27B

Quantization Details

Scheme: FP8_DYNAMIC — static per-channel weight quantization (RTN) with dynamic per-token activation quantization at inference time
Quantized layers: All Linear layers
Excluded layers: lm_head (preserved in BF16 for output stability)
Calibration data: None required (RTN-based)

Weights are already quantized. Do not apply runtime quantization (i.e., do not use --quantization fp8). Use --dtype auto to let vLLM auto-detect the FP8 format from the checkpoint config.

Deployment with vLLM

vllm serve turnio/medgemma-27b-text-it-FP8-Dynamic \
  --served-model-name medgemma \
  --dtype auto

Intended Use

This model is a quantized version of Google's MedGemma 27B, intended as a starting point for developers in healthcare and life sciences to build downstream applications. It is suitable for:

Medical text comprehension and reasoning
Question-answering on medical topics
Clinical text summarization
Medical RAG pipelines
Fine-tuning for domain-specific tasks

Limitations

This model inherits all limitations of the base model. Critically:

Not for direct clinical use. Outputs are not intended to directly inform clinical diagnosis, patient management, or treatment recommendations.
Requires validation. All outputs require independent verification and clinical correlation before any clinical use.
Accuracy not guaranteed. Inaccurate output is possible even for domains with substantial training data.
Quantization effects. FP8 quantization may introduce minor numerical differences compared to the BF16 original. Evaluate on your specific use case.

Benchmarks

Benchmarks below are from the base model (google/medgemma-27b-text-it). FP8 Dynamic quantization is expected to produce near-identical results.

Benchmark	MedGemma 27B	Gemma 3 27B
MedQA (4-op, 0-shot)	87.7	74.9
MedMCQA	74.2	62.6
PubMedQA	76.8	73.4
MMLU Med	87.0	83.3
MedXpertQA	26.7	15.7
AfriMed-QA	84.0	72.0

Citation

@article{sellergren2025medgemma,
    title = {MedGemma Technical Report},
    author = {Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others},
    journal={arXiv preprint arXiv:2507.05201},
    year = {2025},
}

License

This model is a derivative of google/medgemma-27b-text-it and is subject to the Health AI Developer Foundations Terms of Use.

Notice: HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use. Use of this model is subject to the restrictions in Section 3.2 of the agreement, including the Prohibited Use Policy. By using this model, you agree to these terms.

Modification notice: This model has been modified from the original. The weights have been quantized from BF16 to FP8 (F8_E4M3) using the FP8_DYNAMIC scheme via llm-compressor. The lm_head layer is preserved in BF16.

Downloads last month: 461

Safetensors

Model size

27B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for turnio/medgemma-27b-text-it-FP8-Dynamic

Base model

google/gemma-3-27b-pt

Finetuned

google/medgemma-27b-text-it

Finetuned

(21)

this model

Paper for turnio/medgemma-27b-text-it-FP8-Dynamic

MedGemma Technical Report

Paper • 2507.05201 • Published Jul 7, 2025 • 16