transformers / docs /source /ko /perf_infer_cpu.md
AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified

CPU์—์„œ ํšจ์œจ์ ์ธ ์ถ”๋ก ํ•˜๊ธฐ [[efficient-inference-on-cpu]]

์ด ๊ฐ€์ด๋“œ๋Š” CPU์—์„œ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ์ถ”๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ์ค‘์ ์„ ๋‘๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

JIT ๋ชจ๋“œ์™€ ํ•จ๊ป˜ํ•˜๋Š” IPEX ๊ทธ๋ž˜ํ”„ ์ตœ์ ํ™” [[ipex-graph-optimization-with-jitmode]]

Intelยฎ Extension for PyTorch(IPEX)๋Š” Transformers ๊ณ„์—ด ๋ชจ๋ธ์˜ jit ๋ชจ๋“œ์—์„œ ์ถ”๊ฐ€์ ์ธ ์ตœ์ ํ™”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. jit ๋ชจ๋“œ์™€ ๋”๋ถˆ์–ด Intelยฎ Extension for PyTorch(IPEX)๋ฅผ ํ™œ์šฉํ•˜์‹œ๊ธธ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅ๋“œ๋ฆฝ๋‹ˆ๋‹ค. Transformers ๋ชจ๋ธ์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ์ผ๋ถ€ ์—ฐ์‚ฐ์ž ํŒจํ„ด์€ ์ด๋ฏธ jit ๋ชจ๋“œ ์—ฐ์‚ฐ์ž ๊ฒฐํ•ฉ(operator fusion)์˜ ํ˜•ํƒœ๋กœ Intelยฎ Extension for PyTorch(IPEX)์—์„œ ์ง€์›๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Multi-head-attention, Concat Linear, Linear+Add, Linear+Gelu, Add+LayerNorm ๊ฒฐํ•ฉ ํŒจํ„ด ๋“ฑ์ด ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ ํ™œ์šฉํ–ˆ์„ ๋•Œ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ์‚ฐ์ž ๊ฒฐํ•ฉ์˜ ์ด์ ์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ๊ณ ์Šค๋ž€ํžˆ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. ๋ถ„์„์— ๋”ฐ๋ฅด๋ฉด, ์งˆ์˜ ์‘๋‹ต, ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ๋ฐ ํ† ํฐ ๋ถ„๋ฅ˜์™€ ๊ฐ™์€ ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” NLP ํƒœ์Šคํฌ ์ค‘ ์•ฝ 70%๊ฐ€ ์ด๋Ÿฌํ•œ ๊ฒฐํ•ฉ ํŒจํ„ด์„ ์‚ฌ์šฉํ•˜์—ฌ Float32 ์ •๋ฐ€๋„์™€ BFloat16 ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ๋ชจ๋‘์—์„œ ์„ฑ๋Šฅ์ƒ์˜ ์ด์ ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

IPEX ๊ทธ๋ž˜ํ”„ ์ตœ์ ํ™”์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ •๋ณด๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

IPEX ์„ค์น˜: [[ipex-installation]]

IPEX ๋ฐฐํฌ ์ฃผ๊ธฐ๋Š” PyTorch๋ฅผ ๋”ฐ๋ผ์„œ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์ •๋ณด๋Š” IPEX ์„ค์น˜ ๋ฐฉ๋ฒ•์„ ํ™•์ธํ•˜์„ธ์š”.