MedGemma-27B-Text-IT-FP8-Dynamic

FP8 Dynamic quantized version of google/medgemma-27b-text-it, quantized by Turn.io using llm-compressor for efficient deployment with vLLM.

Model Details

Attribute Value
Base model google/medgemma-27b-text-it
Architecture Decoder-only Transformer (Gemma 3)
Domain Medical / Biomedical NLP
Modality Text-only
Quantization FP8 Dynamic (FP8_DYNAMIC scheme)
Quantization tool llmcompressor 0.9.0
Tensor types BF16, F8_E4M3
Context length 128K tokens
Parameters 27B

Quantization Details

  • Scheme: FP8_DYNAMIC — static per-channel weight quantization (RTN) with dynamic per-token activation quantization at inference time
  • Quantized layers: All Linear layers
  • Excluded layers: lm_head (preserved in BF16 for output stability)
  • Calibration data: None required (RTN-based)

Weights are already quantized. Do not apply runtime quantization (i.e., do not use --quantization fp8). Use --dtype auto to let vLLM auto-detect the FP8 format from the checkpoint config.

Deployment with vLLM

vllm serve turnio/medgemma-27b-text-it-FP8-Dynamic \
  --served-model-name medgemma \
  --dtype auto

Intended Use

This model is a quantized version of Google's MedGemma 27B, intended as a starting point for developers in healthcare and life sciences to build downstream applications. It is suitable for:

  • Medical text comprehension and reasoning
  • Question-answering on medical topics
  • Clinical text summarization
  • Medical RAG pipelines
  • Fine-tuning for domain-specific tasks

Limitations

This model inherits all limitations of the base model. Critically:

  1. Not for direct clinical use. Outputs are not intended to directly inform clinical diagnosis, patient management, or treatment recommendations.
  2. Requires validation. All outputs require independent verification and clinical correlation before any clinical use.
  3. Accuracy not guaranteed. Inaccurate output is possible even for domains with substantial training data.
  4. Quantization effects. FP8 quantization may introduce minor numerical differences compared to the BF16 original. Evaluate on your specific use case.

Benchmarks

Benchmarks below are from the base model (google/medgemma-27b-text-it). FP8 Dynamic quantization is expected to produce near-identical results.

Benchmark MedGemma 27B Gemma 3 27B
MedQA (4-op, 0-shot) 87.7 74.9
MedMCQA 74.2 62.6
PubMedQA 76.8 73.4
MMLU Med 87.0 83.3
MedXpertQA 26.7 15.7
AfriMed-QA 84.0 72.0

Citation

@article{sellergren2025medgemma,
    title = {MedGemma Technical Report},
    author = {Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others},
    journal={arXiv preprint arXiv:2507.05201},
    year = {2025},
}

License

This model is a derivative of google/medgemma-27b-text-it and is subject to the Health AI Developer Foundations Terms of Use.

Notice: HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use. Use of this model is subject to the restrictions in Section 3.2 of the agreement, including the Prohibited Use Policy. By using this model, you agree to these terms.

Modification notice: This model has been modified from the original. The weights have been quantized from BF16 to FP8 (F8_E4M3) using the FP8_DYNAMIC scheme via llm-compressor. The lm_head layer is preserved in BF16.

Downloads last month
50
Safetensors
Model size
27B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for turnio/medgemma-27b-text-it-FP8-Dynamic

Quantized
(26)
this model

Paper for turnio/medgemma-27b-text-it-FP8-Dynamic