Model Overview

MedAIBase/AntAngelMed-INT4 is a high-performance quantized version of MedAIBase/AntAngelMed designed for high-efficiency clinical applications. This model utilizes GPTQ INT4 quantization to significantly accelerate inference speeds and reduce memory consumption while maintaining high numerical accuracy. It is specifically optimized for large-scale medical AI deployment.

Model Size

数据精度 (Precision) 权重大小 (Size) 备注
BF16 (Brain Float 16) 192 GB 16 位浮点数
FP8 (Float 8) 92 GB 8 位浮点数
INT4 (Integer 4) 52 GB 4 位量化

Performance & Acceleration

The FP8-quantized architecture is purpose-built for high-concurrency production environments. It addresses the "memory wall" often encountered in medical LLMs, enabling the deployment of larger models on cost-effective hardware.

These metrics demonstrate robust acceleration performance across diverse and complex domains.

image

Model Accuracy

Despite the aggressive quantization, the model maintains high-fidelity outputs. As shown below, the accuracy trade-off is negligible, ensuring clinical reliability is preserved.

image

Quick Start

Requirements

  • H200-class Computational Performance
  • CUDA 12.0+
  • PyTorch 2.0+

Installation

pip install sglang==0.5.6

Inference with SGLang

python3 -m sglang.launch_server  \
    --model-path MedAIBase/AntAngelMed-INT4 \
    --host 0.0.0.0 --port 30012  \
    --trust-remote-code  \
    --attention-backend fa3  \
    --mem-fraction-static 0.9 \
    --tp-size 1  
Downloads last month
25
Safetensors
Model size
15B params
Tensor type
BF16
·
I64
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MedAIBase/AntAngelMed-INT4

Quantized
(4)
this model