DSAiLab
/

llama2-70b-gptq-3bit-32g

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

LLaMA2-70B-GPTQ-3bit-32g

본 모델은 Meta의 LLaMA2-70B 모델을 기반으로 GPTQ 방식으로 3bit 양자화된 버전입니다.

Quantization (GPTQ)

Base Model: LLaMA2-70B
Quantization Type: GPTQ 3bit
Group Size: 32
Bits: 3bit (int3)
Activation Ordering: Enabled
지원 프레임워크: vLLM, SGLang

특징

메모리 절약: 3bit 압축으로 70B 모델을 단일 A100/H100에서 실행 가능
성능-용량 절충: 4bit 대비 약간의 정확도 손실이 발생할 수 있으나, 빠른 추론과 낮은 비용으로 장점 극대화

Downloads last month: 1

Safetensors

Model size

10B params

Tensor type

I32

·

F16

·