DrDavis's picture
Upload folder using huggingface_hub
17c6d62 verified
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# GGUF와 Transformers의 μƒν˜Έμž‘μš© [[gguf-and-interaction-with-transformers]]
GGUF 파일 ν˜•μ‹μ€ [GGML](https://github.com/ggerganov/ggml)κ³Ό 그에 μ˜μ‘΄ν•˜λŠ” λ‹€λ₯Έ 라이브러리, 예λ₯Ό λ“€μ–΄ 맀우 인기 μžˆλŠ” [llama.cpp](https://github.com/ggerganov/llama.cpp)μ΄λ‚˜ [whisper.cpp](https://github.com/ggerganov/whisper.cpp)μ—μ„œ 좔둠을 μœ„ν•œ λͺ¨λΈμ„ μ €μž₯ν•˜λŠ”λ° μ‚¬μš©λ©λ‹ˆλ‹€.
이 파일 ν˜•μ‹μ€ [Hugging Face Hub](https://huggingface.co/docs/hub/en/gguf)μ—μ„œ μ§€μ›λ˜λ©°, 파일 λ‚΄μ˜ ν…μ„œμ™€ 메타데이터λ₯Ό μ‹ μ†ν•˜κ²Œ 검사할 수 μžˆλŠ” κΈ°λŠ₯을 μ œκ³΅ν•©λ‹ˆλ‹€.
이 ν˜•μ‹μ€ "단일 파일 ν˜•μ‹(single-file-format)"으둜 μ„€κ³„λ˜μ—ˆμœΌλ©°, ν•˜λ‚˜μ˜ νŒŒμΌμ— μ„€μ • 속성, ν† ν¬λ‚˜μ΄μ € μ–΄νœ˜, 기타 μ†μ„±λΏλ§Œ μ•„λ‹ˆλΌ λͺ¨λΈμ—μ„œ λ‘œλ“œλ˜λŠ” λͺ¨λ“  ν…μ„œκ°€ ν¬ν•¨λ©λ‹ˆλ‹€. 이 νŒŒμΌλ“€μ€ 파일의 μ–‘μžν™” μœ ν˜•μ— 따라 λ‹€λ₯Έ ν˜•μ‹μœΌλ‘œ μ œκ³΅λ©λ‹ˆλ‹€. λ‹€μ–‘ν•œ μ–‘μžν™” μœ ν˜•μ— λŒ€ν•œ κ°„λž΅ν•œ μ„€λͺ…은 [μ—¬κΈ°](https://huggingface.co/docs/hub/en/gguf#quantization-types)μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
## Transformers λ‚΄ 지원 [[support-within-transformers]]
`transformers` λ‚΄μ—μ„œ `gguf` νŒŒμΌμ„ λ‘œλ“œν•  수 μžˆλŠ” κΈ°λŠ₯을 μΆ”κ°€ν•˜μ—¬ GGUF λͺ¨λΈμ˜ μΆ”κ°€ ν•™μŠ΅/λ―Έμ„Έ 쑰정을 μ œκ³΅ν•œ ν›„ `ggml` μƒνƒœκ³„μ—μ„œ λ‹€μ‹œ μ‚¬μš©ν•  수 μžˆλ„λ‘ `gguf` 파일둜 λ³€ν™˜ν•˜λŠ” κΈ°λŠ₯을 μ œκ³΅ν•©λ‹ˆλ‹€. λͺ¨λΈμ„ λ‘œλ“œν•  λ•Œ λ¨Όμ € FP32둜 μ—­μ–‘μžν™”ν•œ ν›„, PyTorchμ—μ„œ μ‚¬μš©ν•  수 μžˆλ„λ‘ κ°€μ€‘μΉ˜λ₯Ό λ‘œλ“œν•©λ‹ˆλ‹€.
> [!NOTE]
> 지원은 아직 초기 단계에 있으며, λ‹€μ–‘ν•œ μ–‘μžν™” μœ ν˜•κ³Ό λͺ¨λΈ μ•„ν‚€ν…μ²˜μ— λŒ€ν•΄ 이λ₯Ό κ°•ν™”ν•˜κΈ° μœ„ν•œ κΈ°μ—¬λ₯Ό ν™˜μ˜ν•©λ‹ˆλ‹€.
ν˜„μž¬ μ§€μ›λ˜λŠ” λͺ¨λΈ μ•„ν‚€ν…μ²˜μ™€ μ–‘μžν™” μœ ν˜•μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
### μ§€μ›λ˜λŠ” μ–‘μžν™” μœ ν˜• [[supported-quantization-types]]
μ΄ˆκΈ°μ— μ§€μ›λ˜λŠ” μ–‘μžν™” μœ ν˜•μ€ Hubμ—μ„œ 곡유된 인기 μžˆλŠ” μ–‘μžν™” νŒŒμΌμ— 따라 κ²°μ •λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
- F32
- F16
- BF16
- Q4_0
- Q4_1
- Q5_0
- Q5_1
- Q8_0
- Q2_K
- Q3_K
- Q4_K
- Q5_K
- Q6_K
- IQ1_S
- IQ1_M
- IQ2_XXS
- IQ2_XS
- IQ2_S
- IQ3_XXS
- IQ3_S
- IQ4_XS
- IQ4_NL
> [!NOTE]
> GGUF μ—­μ–‘μžν™”λ₯Ό μ§€μ›ν•˜λ €λ©΄ `gguf>=0.10.0` μ„€μΉ˜κ°€ ν•„μš”ν•©λ‹ˆλ‹€.
### μ§€μ›λ˜λŠ” λͺ¨λΈ μ•„ν‚€ν…μ²˜ [[supported-model-architectures]]
ν˜„μž¬ μ§€μ›λ˜λŠ” λͺ¨λΈ μ•„ν‚€ν…μ²˜λŠ” Hubμ—μ„œ 맀우 인기가 λ§Žμ€ μ•„ν‚€ν…μ²˜λ“€λ‘œ μ œν•œλ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€:
- LLaMa
- Mistral
- Qwen2
- Qwen2Moe
- Phi3
- Bloom
## μ‚¬μš© μ˜ˆμ‹œ [[example-usage]]
`transformers`μ—μ„œ `gguf` νŒŒμΌμ„ λ‘œλ“œν•˜λ €λ©΄ `from_pretrained` λ©”μ†Œλ“œμ— `gguf_file` 인수λ₯Ό μ§€μ •ν•΄μ•Ό ν•©λ‹ˆλ‹€. λ™μΌν•œ νŒŒμΌμ—μ„œ ν† ν¬λ‚˜μ΄μ €μ™€ λͺ¨λΈμ„ λ‘œλ“œν•˜λŠ” 방법은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
```
이제 PyTorch μƒνƒœκ³„μ—μ„œ λͺ¨λΈμ˜ μ–‘μžν™”λ˜μ§€ μ•Šμ€ 전체 버전에 μ ‘κ·Όν•  수 있으며, λ‹€λ₯Έ μ—¬λŸ¬ 도ꡬ듀과 κ²°ν•©ν•˜μ—¬ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
`gguf` 파일둜 λ‹€μ‹œ λ³€ν™˜ν•˜λ €λ©΄ llama.cpp의 [`convert-hf-to-gguf.py`](https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py)λ₯Ό μ‚¬μš©ν•˜λŠ” 것을 ꢌμž₯ν•©λ‹ˆλ‹€.
μœ„μ˜ 슀크립트λ₯Ό μ™„λ£Œν•˜μ—¬ λͺ¨λΈμ„ μ €μž₯ν•˜κ³  λ‹€μ‹œ `gguf`둜 λ‚΄λ³΄λ‚΄λŠ” 방법은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
```python
tokenizer.save_pretrained('directory')
model.save_pretrained('directory')
!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}
```