| --- |
| license: gpl-3.0 |
| language: |
| - code |
| datasets: |
| - KAKA22/CodeRM-UnitTest |
| base_model: |
| - Qwen/Qwen2.5-Coder-7B-Instruct |
| tags: |
| - code |
| - unit-testing |
| - qwen |
| --- |
| |
| # Qwen 2.5 Coder Instruct - Python Unit Test Fine-tune |
|
|
| This model is a fine-tuned version of **Qwen 2.5 Coder Instruct**, specifically trained to automate the generation of Python unit tests. |
|
|
| > **Note:** If your specific version of Qwen 2.5 Coder Instruct is a different parameter size (e.g., 1.5B or 32B), make sure to update `Qwen/Qwen2.5-Coder-7B-Instruct` in the YAML header above with the exact Hugging Face path of the base model you used. |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |------------|-----------------------------------------------------------------------| |
| | Base Model | Qwen 2.5 Coder Instruct | |
| | Dataset | [KAKA22/CodeRM-UnitTest](https://huggingface.co/datasets/KAKA22/CodeRM-UnitTest) | |
| | Language | Python | |
| | Format | 16-bit (Safetensors/PyTorch) | |
|
|
| --- |
|
|
| ## Running Locally as a 4-bit Quantized GGUF |
|
|
| Since the default weights are in 16-bit format, you can significantly reduce the memory footprint by converting and quantizing the model to a **4-bit GGUF** format using `llama.cpp`. This makes it much easier to run locally on consumer hardware. |
|
|
| ### 1. Clone and Compile llama.cpp |
|
|
| Clone the repository and build the tools. You will need a C++ compiler and `make` installed on your system. |
|
|
| ```bash |
| git clone https://github.com/ggerganov/llama.cpp |
| cd llama.cpp |
| make |
| ``` |
|
|
| > **Note:** If you are using a GPU, you may want to compile with specific flags (e.g., `make GGML_CUDA=1` for NVIDIA GPUs). |
| |
| Next, install the Python dependencies required for the conversion script: |
| |
| ```bash |
| pip install -r requirements.txt |
| ``` |
| |
| ### 2. Download the 16-bit Model |
| |
| Download the files from this Hugging Face repository to a local folder using `huggingface-cli`. Replace `<YOUR_USERNAME>/<YOUR_MODEL_NAME>` with your actual Hugging Face repository ID: |
| |
| ```bash |
| huggingface-cli download <YOUR_USERNAME>/<YOUR_MODEL_NAME> --local-dir ../my-16bit-model |
| ``` |
| |
| ### 3. Convert to GGUF (FP16) |
| |
| Before quantizing to 4-bit, convert the Hugging Face model format into an unquantized (FP16) GGUF format. Run this from inside the `llama.cpp` directory: |
| |
| ```bash |
| python convert_hf_to_gguf.py ../my-16bit-model --outfile ../my-16bit-model/model-fp16.gguf |
| ``` |
| |
| ### 4. Quantize to 4-bit (Q4_K_M) |
| |
| Use the compiled `llama-quantize` executable to compress the model to a 4-bit format. The `Q4_K_M` method provides a great balance between size and quality. |
| |
| ```bash |
| ./llama-quantize ../my-16bit-model/model-fp16.gguf ../my-16bit-model/model-q4_k_m.gguf Q4_K_M |
| ``` |
| |
| You can now use `model-q4_k_m.gguf` with any standard GGUF runner like **Ollama**, **LM Studio**, or the **llama.cpp server**! |