Model Overview
MedAIBase/AntAngelMed-INT4 is a high-performance quantized version of MedAIBase/AntAngelMed designed for high-efficiency clinical applications. This model utilizes GPTQ INT4 quantization to significantly accelerate inference speeds and reduce memory consumption while maintaining high numerical accuracy. It is specifically optimized for large-scale medical AI deployment.
Model Size
| 数据精度 (Precision) | 权重大小 (Size) | 备注 |
|---|---|---|
| BF16 (Brain Float 16) | 192 GB | 16 位浮点数 |
| FP8 (Float 8) | 92 GB | 8 位浮点数 |
| INT4 (Integer 4) | 52 GB | 4 位量化 |
Performance & Acceleration
The FP8-quantized architecture is purpose-built for high-concurrency production environments. It addresses the "memory wall" often encountered in medical LLMs, enabling the deployment of larger models on cost-effective hardware.
These metrics demonstrate robust acceleration performance across diverse and complex domains.
Model Accuracy
Despite the aggressive quantization, the model maintains high-fidelity outputs. As shown below, the accuracy trade-off is negligible, ensuring clinical reliability is preserved.
Quick Start
Requirements
- H200-class Computational Performance
- CUDA 12.0+
- PyTorch 2.0+
Installation
pip install sglang==0.5.6
Inference with SGLang
python3 -m sglang.launch_server \
--model-path MedAIBase/AntAngelMed-INT4 \
--host 0.0.0.0 --port 30012 \
--trust-remote-code \
--attention-backend fa3 \
--mem-fraction-static 0.9 \
--tp-size 1
- Downloads last month
- 25
Model tree for MedAIBase/AntAngelMed-INT4
Base model
inclusionAI/Ling-flash-base-2.0
