DrDavis's picture
Upload folder using huggingface_hub
17c6d62 verified

EETQ [[eetq]]

EETQ λΌμ΄λΈŒλŸ¬λ¦¬λŠ” NVIDIA GPU에 λŒ€ν•΄ int8 채널별(per-channel) κ°€μ€‘μΉ˜ μ „μš© μ–‘μžν™”(weight-only quantization)을 μ§€μ›ν•©λ‹ˆλ‹€. κ³ μ„±λŠ₯ GEMM 및 GEMV 컀널은 FasterTransformer 및 TensorRT-LLMμ—μ„œ κ°€μ Έμ™”μŠ΅λ‹ˆλ‹€. ꡐ정(calibration) 데이터셋이 ν•„μš” μ—†μœΌλ©°, λͺ¨λΈμ„ 사전에 μ–‘μžν™”ν•  ν•„μš”λ„ μ—†μŠ΅λ‹ˆλ‹€. λ˜ν•œ, 채널별 μ–‘μžν™”(per-channel quantization) 덕뢄에 정확도 μ €ν•˜κ°€ λ―Έλ―Έν•©λ‹ˆλ‹€.

릴리슀 νŽ˜μ΄μ§€μ—μ„œ eetqλ₯Ό μ„€μΉ˜ν–ˆλŠ”μ§€ ν™•μΈν•˜μ„Έμš”.

pip install --no-cache-dir https://github.com/NetEase-FuXi/EETQ/releases/download/v1.0.0/EETQ-1.0.0+cu121+torch2.1.2-cp310-cp310-linux_x86_64.whl

λ˜λŠ” μ†ŒμŠ€ μ½”λ“œ https://github.com/NetEase-FuXi/EETQ μ—μ„œ μ„€μΉ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€. EETQλŠ” CUDA κΈ°λŠ₯이 8.9 μ΄ν•˜μ΄κ³  7.0 이상이어야 ν•©λ‹ˆλ‹€.

git clone https://github.com/NetEase-FuXi/EETQ.git
cd EETQ/
git submodule update --init --recursive
pip install .

λΉ„μ–‘μžν™” λͺ¨λΈμ€ "from_pretrained"λ₯Ό 톡해 μ–‘μžν™”ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

from transformers import AutoModelForCausalLM, EetqConfig
path = "/path/to/model".
quantization_config = EetqConfig("int8")
model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", quantization_config=quantization_config)

μ–‘μžν™”λœ λͺ¨λΈμ€ "save_pretrained"λ₯Ό 톡해 μ €μž₯ν•  수 있으며, "from_pretrained"λ₯Ό 톡해 λ‹€μ‹œ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

quant_path = "/path/to/save/quantized/model"
model.save_pretrained(quant_path)
model = AutoModelForCausalLM.from_pretrained(quant_path, device_map="auto")