MedAIBase
/

AntAngelMed-INT4

compressed-tensors

Model card Files Files and versions

yarkcy commited on Jan 21

Commit

8e3b795

·

verified ·

1 Parent(s): a0f4216

Update README.md

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- MedAIBase/AntAngelMed
+---
+## Model Overview
+**MedAIBase/AntAngelMed-INT4** is a high-performance quantized version of [MedAIBase/AntAngelMed](https://huggingface.co/MedAIBase/AntAngelMed) designed for high-efficiency clinical applications. This model utilizes **GPTQ INT4 quantization** to significantly accelerate inference speeds and reduce memory consumption while maintaining high numerical accuracy. It is specifically optimized for large-scale medical AI deployment.
+## Model Size
+| 数据精度 (Precision) | 权重大小 (Size) | 备注 |
+|----------------------|----------------|------|
+| BF16 (Brain Float 16) | 192 GB | 16 位浮点数 |
+| FP8 (Float 8)         | 92 GB  | 8 位浮点数 |
+| INT4 (Integer 4)      | 52 GB  | 4 位量化 |
+## Performance & Acceleration
+The FP8-quantized architecture is purpose-built for high-concurrency production environments. It addresses the "memory wall" often encountered in medical LLMs, enabling the deployment of larger models on cost-effective hardware.
+These metrics demonstrate robust acceleration performance across diverse and complex domains.
+![image](https://hackmd.io/_uploads/rJgdJaprWl.png)
+## Model Accuracy
+Despite the aggressive quantization, the model maintains high-fidelity outputs. As shown below, the accuracy trade-off is negligible, ensuring clinical reliability is preserved.
+![image](https://hackmd.io/_uploads/rkAOy6aHWg.png)
+## Quick Start
+### Requirements
+- H200-class Computational Performance
+- CUDA 12.0+
+- PyTorch 2.0+
+### Installation
+```bash
+pip install sglang==0.5.6
+```
+### Inference with SGLang
+```python
+python3 -m sglang.launch_server  \
+    --model-path MedAIBase/AntAngelMed-INT4 \
+    --host 0.0.0.0 --port 30012  \
+    --trust-remote-code  \
+    --attention-backend fa3  \
+    --mem-fraction-static 0.9 \
+    --tp-size 1
+```