| # Guide: Using a Custom Fine-Tuned Model with bitnet.cpp | |
| This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference. | |
| ## Prerequisites | |
| Before you begin, ensure you have the following prerequisites installed and configured: | |
| - Python 3.9 or later | |
| - CMake 3.22 or later | |
| - A C++ compiler (e.g., clang, g++) | |
| - The Hugging Face Hub CLI (`huggingface-cli`) | |
| ## Step 1: Download the Custom Model | |
| In this guide, we will use the `tuandunghcmut/BitNET-Summarization` model as an example. This model was fine-tuned by `tuandunghcmut` for summarization tasks. We will download it and place it in a directory that the `setup_env.py` script can recognize. | |
| ```bash | |
| huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T | |
| ``` | |
| This command downloads the model and places it in the `models/BitNet-b1.58-2B-4T` directory. This is a workaround to make the existing scripts recognize the custom model. | |
| ## Step 2: Convert the Model to GGUF Format | |
| The downloaded model is in the `.safetensors` format. We need to convert it to the GGUF format to be used with `bitnet.cpp`. We will use the `convert-helper-bitnet.py` script for this. | |
| However, the script needs some modifications to work with this custom model. | |
| ### Modifications to the Conversion Scripts | |
| 1. **`utils/convert-helper-bitnet.py`**: Add the `--skip-unknown` flag to the `cmd_convert` list to ignore unknown tensor names. | |
| ```python | |
| cmd_convert = [ | |
| sys.executable, | |
| str(convert_script), | |
| str(model_dir), | |
| "--vocab-type", "bpe", | |
| "--outtype", "f32", | |
| "--concurrency", "1", | |
| "--outfile", str(gguf_f32_output), | |
| "--skip-unknown" | |
| ] | |
| ``` | |
| 2. **`utils/convert-hf-to-gguf-bitnet.py`**: | |
| - Add the `BitNetForCausalLM` architecture to the `@Model.register` decorator for the `BitnetModel` class. | |
| - Change the `set_vocab` method in the `BitnetModel` class to use `_set_vocab_gpt2()`. | |
| ```python | |
| @Model.register("BitNetForCausalLM", "BitnetForCausalLM") | |
| class BitnetModel(Model): | |
| model_arch = gguf.MODEL_ARCH.BITNET | |
| def set_vocab(self): | |
| self._set_vocab_gpt2() | |
| ``` | |
| ### Running the Conversion | |
| After making these changes, run the conversion script: | |
| ```bash | |
| python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T | |
| ``` | |
| This will create the `ggml-model-i2s-bitnet.gguf` file in the model directory. | |
| ## Step 3: Compile bitnet.cpp | |
| Now, we need to compile the C++ code. We will use the `setup_env.py` script for this. We will use the `i2_s` quantization type. | |
| ```bash | |
| python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s | |
| ``` | |
| This command will compile the C++ code and create the necessary binaries. | |
| ## Step 4: Run Inference | |
| Finally, we can run inference with the converted model. | |
| ```bash | |
| python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello" | |
| ``` | |
| This will load the model and generate a response to the prompt "Hello". | |
| ## Build Environment | |
| This project was built and compiled on a CPU-only machine with the following specifications: | |
| - **CPU:** AMD EPYC 9754 128-Core Processor | |
| - **Memory:** 251Gi | |
| ## Fine-Tuning | |
| The `tuandunghcmut/BitNET-Summarization` model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library. |