ihebig1 commited on
Commit
fa86797
·
verified ·
1 Parent(s): bc076ff

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ tags:
4
+ - medical
5
+ - biomedical
6
+ - text-generation
7
+ - fp8
8
+ - quantization
9
+ - vllm
10
+ - medgemma
11
+ library_name: transformers
12
+ base_model:
13
+ - google/medgemma-27b-text-it
14
+ ---
15
+
16
+ # MedGemma-27B-Text-IT-FP8-Dynamic
17
+
18
+ ## Overview
19
+ **MedGemma-27B-Text-IT-FP8-Dynamic** is an **FP8 Dynamic–quantized** derivative of **Google’s MedGemma-27B-Text-IT** model, optimized for high-throughput inference while preserving strong performance on medical and biomedical instruction-tuned text-only tasks.
20
+
21
+ This version is intended for **vLLM deployment** on modern NVIDIA GPUs and follows a conservative FP8 Dynamic quantization strategy designed for maximum stability.
22
+
23
+ ---
24
+
25
+ ## Base Model
26
+ - **Base model**: `google/medgemma-27b-text-it`
27
+ - **Architecture**: Decoder-only Transformer (instruction-tuned)
28
+ - **Domain**: Medical / Biomedical NLP
29
+ - **Modality**: **Text-only**
30
+
31
+ ---
32
+
33
+ ## Quantization Details
34
+ - **Method**: FP8 Dynamic
35
+ - **Tooling**: `llmcompressor`
36
+ - **Quantized layers**: Linear layers
37
+ - **Excluded components**:
38
+ - `lm_head`
39
+
40
+ ### Rationale
41
+ - FP8 Dynamic reduces VRAM usage and improves inference throughput.
42
+ - Excluding `lm_head` preserves output stability.
43
+ - The resulting model is fully compatible with **vLLM**.
44
+
45
+ **Weights are already quantized — do not apply runtime quantization.**
46
+
47
+ ---
48
+
49
+ ## Intended Use
50
+ - Medical and biomedical instruction-following
51
+ - Clinical text summarization
52
+ - Medical RAG pipelines
53
+ - Decision-support and research assistance
54
+
55
+ ---
56
+
57
+ ## Deployment (vLLM)
58
+
59
+ ### Recommended
60
+ ```bash
61
+ vllm serve ig1/medgemma-27b-text-it-FP8-Dynamic \
62
+ --served-model-name medgemma-27b-text-it-fp8 \
63
+ --dtype auto