MedGemma-27B-Text-IT-FP8-Dynamic
FP8 Dynamic quantized version of google/medgemma-27b-text-it, quantized by Turn.io using llm-compressor for efficient deployment with vLLM.
Model Details
| Attribute | Value |
|---|---|
| Base model | google/medgemma-27b-text-it |
| Architecture | Decoder-only Transformer (Gemma 3) |
| Domain | Medical / Biomedical NLP |
| Modality | Text-only |
| Quantization | FP8 Dynamic (FP8_DYNAMIC scheme) |
| Quantization tool | llmcompressor 0.9.0 |
| Tensor types | BF16, F8_E4M3 |
| Context length | 128K tokens |
| Parameters | 27B |
Quantization Details
- Scheme:
FP8_DYNAMIC— static per-channel weight quantization (RTN) with dynamic per-token activation quantization at inference time - Quantized layers: All
Linearlayers - Excluded layers:
lm_head(preserved in BF16 for output stability) - Calibration data: None required (RTN-based)
Weights are already quantized. Do not apply runtime quantization (i.e., do not use --quantization fp8). Use --dtype auto to let vLLM auto-detect the FP8 format from the checkpoint config.
Deployment with vLLM
vllm serve turnio/medgemma-27b-text-it-FP8-Dynamic \
--served-model-name medgemma \
--dtype auto
Intended Use
This model is a quantized version of Google's MedGemma 27B, intended as a starting point for developers in healthcare and life sciences to build downstream applications. It is suitable for:
- Medical text comprehension and reasoning
- Question-answering on medical topics
- Clinical text summarization
- Medical RAG pipelines
- Fine-tuning for domain-specific tasks
Limitations
This model inherits all limitations of the base model. Critically:
- Not for direct clinical use. Outputs are not intended to directly inform clinical diagnosis, patient management, or treatment recommendations.
- Requires validation. All outputs require independent verification and clinical correlation before any clinical use.
- Accuracy not guaranteed. Inaccurate output is possible even for domains with substantial training data.
- Quantization effects. FP8 quantization may introduce minor numerical differences compared to the BF16 original. Evaluate on your specific use case.
Benchmarks
Benchmarks below are from the base model (google/medgemma-27b-text-it). FP8 Dynamic quantization is expected to produce near-identical results.
| Benchmark | MedGemma 27B | Gemma 3 27B |
|---|---|---|
| MedQA (4-op, 0-shot) | 87.7 | 74.9 |
| MedMCQA | 74.2 | 62.6 |
| PubMedQA | 76.8 | 73.4 |
| MMLU Med | 87.0 | 83.3 |
| MedXpertQA | 26.7 | 15.7 |
| AfriMed-QA | 84.0 | 72.0 |
Citation
@article{sellergren2025medgemma,
title = {MedGemma Technical Report},
author = {Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others},
journal={arXiv preprint arXiv:2507.05201},
year = {2025},
}
License
This model is a derivative of google/medgemma-27b-text-it and is subject to the Health AI Developer Foundations Terms of Use.
Notice: HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use. Use of this model is subject to the restrictions in Section 3.2 of the agreement, including the Prohibited Use Policy. By using this model, you agree to these terms.
Modification notice: This model has been modified from the original. The weights have been quantized from BF16 to FP8 (F8_E4M3) using the FP8_DYNAMIC scheme via llm-compressor. The lm_head layer is preserved in BF16.
- Downloads last month
- 50