Buckets:

hf-doc-build
/

doc

Files

xet

hf-doc-build/doc / transformers /main /ko /quantization /eetq.md

HuggingFaceDocBuilder

about 15 hours ago

preview code

download

raw

1.7 kB

	# EETQ [[eetq]]

	[EETQ](https://github.com/NetEase-FuXi/EETQ) 라이브러리는 NVIDIA GPU에 대해 int8 채널별(per-channel) 가중치 전용 양자화(weight-only quantization)을 지원합니다. 고성능 GEMM 및 GEMV 커널은 FasterTransformer 및 TensorRT-LLM에서 가져왔습니다. 교정(calibration) 데이터셋이 필요 없으며, 모델을 사전에 양자화할 필요도 없습니다. 또한, 채널별 양자화(per-channel quantization) 덕분에 정확도 저하가 미미합니다.

	[릴리스 페이지](https://github.com/NetEase-FuXi/EETQ/releases)에서 eetq를 설치했는지 확인하세요.
	```
	pip install --no-cache-dir https://github.com/NetEase-FuXi/EETQ/releases/download/v1.0.0/EETQ-1.0.0+cu121+torch2.1.2-cp310-cp310-linux_x86_64.whl
	```
	또는 소스 코드 https://github.com/NetEase-FuXi/EETQ 에서 설치할 수 있습니다. EETQ는 CUDA 기능이 8.9 이하이고 7.0 이상이어야 합니다.
	```
	git clone https://github.com/NetEase-FuXi/EETQ.git
	cd EETQ/
	git submodule update --init --recursive
	pip install .
	```

	비양자화 모델은 "from_pretrained"를 통해 양자화할 수 있습니다.
	```py
	from transformers import AutoModelForCausalLM, EetqConfig
	path = "/path/to/model".
	quantization_config = EetqConfig("int8")
	model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", quantization_config=quantization_config)
	```

	양자화된 모델은 "save_pretrained"를 통해 저장할 수 있으며, "from_pretrained"를 통해 다시 사용할 수 있습니다.

	```py
	quant_path = "/path/to/save/quantized/model"
	model.save_pretrained(quant_path)
	model = AutoModelForCausalLM.from_pretrained(quant_path, device_map="auto")
	```

Xet Storage Details

Size:: 1.7 kB
Xet hash:: be54ec62dfbcd9542b7957c0f94e807a615318a18fda33830a34c8473d5cba64

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.