DrDavis's picture
Upload folder using huggingface_hub
17c6d62 verified

์–‘์žํ™”[[quantization]]

์–‘์žํ™” ๊ธฐ๋ฒ•์€ ๊ฐ€์ค‘์น˜์™€ ํ™œ์„ฑํ™”๋ฅผ 8๋น„ํŠธ ์ •์ˆ˜(int8)์™€ ๊ฐ™์€ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์œผ๋กœ ํ‘œํ˜„ํ•จ์œผ๋กœ์จ ๋ฉ”๋ชจ๋ฆฌ์™€ ๊ณ„์‚ฐ ๋น„์šฉ์„ ์ค„์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ฆด ์ˆ˜ ์—†๋Š” ๋” ํฐ ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ๊ณ , ์ถ”๋ก  ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Transformers๋Š” AWQ์™€ GPTQ ์–‘์žํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ง€์›ํ•˜๋ฉฐ, bitsandbytes๋ฅผ ํ†ตํ•ด 8๋น„ํŠธ์™€ 4๋น„ํŠธ ์–‘์žํ™”๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Transformers์—์„œ ์ง€์›๋˜์ง€ ์•Š๋Š” ์–‘์žํ™” ๊ธฐ๋ฒ•๋“ค์€ [HfQuantizer] ํด๋ž˜์Šค๋ฅผ ํ†ตํ•ด ์ถ”๊ฐ€๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ์„ ์–‘์žํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ด ์–‘์žํ™” ๊ฐ€์ด๋“œ๋ฅผ ํ†ตํ•ด ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

QuantoConfig[[transformers.QuantoConfig]]

[[autodoc]] QuantoConfig

AqlmConfig[[transformers.AqlmConfig]]

[[autodoc]] AqlmConfig

VptqConfig[[transformers.VptqConfig]]

[[autodoc]] VptqConfig

AwqConfig[[transformers.AwqConfig]]

[[autodoc]] AwqConfig

EetqConfig[[transformers.EetqConfig]]

[[autodoc]] EetqConfig

GPTQConfig[[transformers.GPTQConfig]]

[[autodoc]] GPTQConfig

BitsAndBytesConfig[[#transformers.BitsAndBytesConfig]]

[[autodoc]] BitsAndBytesConfig

HfQuantizer[[transformers.quantizers.HfQuantizer]]

[[autodoc]] quantizers.base.HfQuantizer

HqqConfig[[transformers.HqqConfig]]

[[autodoc]] HqqConfig

FbgemmFp8Config[[transformers.FbgemmFp8Config]]

[[autodoc]] FbgemmFp8Config

CompressedTensorsConfig[[transformers.CompressedTensorsConfig]]

[[autodoc]] CompressedTensorsConfig

TorchAoConfig[[transformers.TorchAoConfig]]

[[autodoc]] TorchAoConfig