| <!--Copyright 2024 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # GGUFμ Transformersμ μνΈμμ© [[gguf-and-interaction-with-transformers]] | |
| GGUF νμΌ νμμ [GGML](https://github.com/ggerganov/ggml)κ³Ό κ·Έμ μμ‘΄νλ λ€λ₯Έ λΌμ΄λΈλ¬λ¦¬, μλ₯Ό λ€μ΄ λ§€μ° μΈκΈ° μλ [llama.cpp](https://github.com/ggerganov/llama.cpp)μ΄λ [whisper.cpp](https://github.com/ggerganov/whisper.cpp)μμ μΆλ‘ μ μν λͺ¨λΈμ μ μ₯νλλ° μ¬μ©λ©λλ€. | |
| μ΄ νμΌ νμμ [Hugging Face Hub](https://huggingface.co/docs/hub/en/gguf)μμ μ§μλλ©°, νμΌ λ΄μ ν μμ λ©νλ°μ΄ν°λ₯Ό μ μνκ² κ²μ¬ν μ μλ κΈ°λ₯μ μ 곡ν©λλ€. | |
| μ΄ νμμ "λ¨μΌ νμΌ νμ(single-file-format)"μΌλ‘ μ€κ³λμμΌλ©°, νλμ νμΌμ μ€μ μμ±, ν ν¬λμ΄μ μ΄ν, κΈ°ν μμ±λΏλ§ μλλΌ λͺ¨λΈμμ λ‘λλλ λͺ¨λ ν μκ° ν¬ν¨λ©λλ€. μ΄ νμΌλ€μ νμΌμ μμν μ νμ λ°λΌ λ€λ₯Έ νμμΌλ‘ μ 곡λ©λλ€. λ€μν μμν μ νμ λν κ°λ΅ν μ€λͺ μ [μ¬κΈ°](https://huggingface.co/docs/hub/en/gguf#quantization-types)μμ νμΈν μ μμ΅λλ€. | |
| ## Transformers λ΄ μ§μ [[support-within-transformers]] | |
| `transformers` λ΄μμ `gguf` νμΌμ λ‘λν μ μλ κΈ°λ₯μ μΆκ°νμ¬ GGUF λͺ¨λΈμ μΆκ° νμ΅/λ―ΈμΈ μ‘°μ μ μ 곡ν ν `ggml` μνκ³μμ λ€μ μ¬μ©ν μ μλλ‘ `gguf` νμΌλ‘ λ³ννλ κΈ°λ₯μ μ 곡ν©λλ€. λͺ¨λΈμ λ‘λν λ λ¨Όμ FP32λ‘ μμμνν ν, PyTorchμμ μ¬μ©ν μ μλλ‘ κ°μ€μΉλ₯Ό λ‘λν©λλ€. | |
| > [!NOTE] | |
| > μ§μμ μμ§ μ΄κΈ° λ¨κ³μ μμΌλ©°, λ€μν μμν μ νκ³Ό λͺ¨λΈ μν€ν μ²μ λν΄ μ΄λ₯Ό κ°ννκΈ° μν κΈ°μ¬λ₯Ό νμν©λλ€. | |
| νμ¬ μ§μλλ λͺ¨λΈ μν€ν μ²μ μμν μ νμ λ€μκ³Ό κ°μ΅λλ€: | |
| ### μ§μλλ μμν μ ν [[supported-quantization-types]] | |
| μ΄κΈ°μ μ§μλλ μμν μ νμ Hubμμ 곡μ λ μΈκΈ° μλ μμν νμΌμ λ°λΌ κ²°μ λμμ΅λλ€. | |
| - F32 | |
| - F16 | |
| - BF16 | |
| - Q4_0 | |
| - Q4_1 | |
| - Q5_0 | |
| - Q5_1 | |
| - Q8_0 | |
| - Q2_K | |
| - Q3_K | |
| - Q4_K | |
| - Q5_K | |
| - Q6_K | |
| - IQ1_S | |
| - IQ1_M | |
| - IQ2_XXS | |
| - IQ2_XS | |
| - IQ2_S | |
| - IQ3_XXS | |
| - IQ3_S | |
| - IQ4_XS | |
| - IQ4_NL | |
| > [!NOTE] | |
| > GGUF μμμνλ₯Ό μ§μνλ €λ©΄ `gguf>=0.10.0` μ€μΉκ° νμν©λλ€. | |
| ### μ§μλλ λͺ¨λΈ μν€ν μ² [[supported-model-architectures]] | |
| νμ¬ μ§μλλ λͺ¨λΈ μν€ν μ²λ Hubμμ λ§€μ° μΈκΈ°κ° λ§μ μν€ν μ²λ€λ‘ μ νλμ΄ μμ΅λλ€: | |
| - LLaMa | |
| - Mistral | |
| - Qwen2 | |
| - Qwen2Moe | |
| - Phi3 | |
| - Bloom | |
| ## μ¬μ© μμ [[example-usage]] | |
| `transformers`μμ `gguf` νμΌμ λ‘λνλ €λ©΄ `from_pretrained` λ©μλμ `gguf_file` μΈμλ₯Ό μ§μ ν΄μΌ ν©λλ€. λμΌν νμΌμμ ν ν¬λμ΄μ μ λͺ¨λΈμ λ‘λνλ λ°©λ²μ λ€μκ³Ό κ°μ΅λλ€: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF" | |
| filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename) | |
| ``` | |
| μ΄μ PyTorch μνκ³μμ λͺ¨λΈμ μμνλμ§ μμ μ 체 λ²μ μ μ κ·Όν μ μμΌλ©°, λ€λ₯Έ μ¬λ¬ λꡬλ€κ³Ό κ²°ν©νμ¬ μ¬μ©ν μ μμ΅λλ€. | |
| `gguf` νμΌλ‘ λ€μ λ³ννλ €λ©΄ llama.cppμ [`convert-hf-to-gguf.py`](https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py)λ₯Ό μ¬μ©νλ κ²μ κΆμ₯ν©λλ€. | |
| μμ μ€ν¬λ¦½νΈλ₯Ό μλ£νμ¬ λͺ¨λΈμ μ μ₯νκ³ λ€μ `gguf`λ‘ λ΄λ³΄λ΄λ λ°©λ²μ λ€μκ³Ό κ°μ΅λλ€: | |
| ```python | |
| tokenizer.save_pretrained('directory') | |
| model.save_pretrained('directory') | |
| !python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory} | |
| ``` | |