yarkcy commited on
Commit
8e3b795
·
verified ·
1 Parent(s): a0f4216

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - MedAIBase/AntAngelMed
5
+ ---
6
+
7
+
8
+ ## Model Overview
9
+
10
+ **MedAIBase/AntAngelMed-INT4** is a high-performance quantized version of [MedAIBase/AntAngelMed](https://huggingface.co/MedAIBase/AntAngelMed) designed for high-efficiency clinical applications. This model utilizes **GPTQ INT4 quantization** to significantly accelerate inference speeds and reduce memory consumption while maintaining high numerical accuracy. It is specifically optimized for large-scale medical AI deployment.
11
+
12
+
13
+ ## Model Size
14
+
15
+ | 数据精度 (Precision) | 权重大小 (Size) | 备注 |
16
+ |----------------------|----------------|------|
17
+ | BF16 (Brain Float 16) | 192 GB | 16 位浮点数 |
18
+ | FP8 (Float 8) | 92 GB | 8 位浮点数 |
19
+ | INT4 (Integer 4) | 52 GB | 4 位量化 |
20
+
21
+ ## Performance & Acceleration
22
+
23
+ The FP8-quantized architecture is purpose-built for high-concurrency production environments. It addresses the "memory wall" often encountered in medical LLMs, enabling the deployment of larger models on cost-effective hardware.
24
+
25
+ These metrics demonstrate robust acceleration performance across diverse and complex domains.
26
+
27
+ ![image](https://hackmd.io/_uploads/rJgdJaprWl.png)
28
+
29
+
30
+ ## Model Accuracy
31
+
32
+ Despite the aggressive quantization, the model maintains high-fidelity outputs. As shown below, the accuracy trade-off is negligible, ensuring clinical reliability is preserved.
33
+
34
+ ![image](https://hackmd.io/_uploads/rkAOy6aHWg.png)
35
+
36
+
37
+
38
+ ## Quick Start
39
+
40
+ ### Requirements
41
+
42
+ - H200-class Computational Performance
43
+ - CUDA 12.0+
44
+ - PyTorch 2.0+
45
+
46
+ ### Installation
47
+
48
+ ```bash
49
+ pip install sglang==0.5.6
50
+ ```
51
+
52
+ ### Inference with SGLang
53
+
54
+ ```python
55
+ python3 -m sglang.launch_server \
56
+ --model-path MedAIBase/AntAngelMed-INT4 \
57
+ --host 0.0.0.0 --port 30012 \
58
+ --trust-remote-code \
59
+ --attention-backend fa3 \
60
+ --mem-fraction-static 0.9 \
61
+ --tp-size 1
62
+
63
+ ```