sachithra1234 commited on
Commit
846ce2b
·
verified ·
1 Parent(s): a80ffaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -3
README.md CHANGED
@@ -1,3 +1,78 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ language:
4
+ - code
5
+ datasets:
6
+ - KAKA22/CodeRM-UnitTest
7
+ base_model:
8
+ - Qwen/Qwen2.5-Coder-7B-Instruct
9
+ tags:
10
+ - code
11
+ - unit-testing
12
+ - qwen
13
+ ---
14
+
15
+ # Qwen 2.5 Coder Instruct - Python Unit Test Fine-tune
16
+
17
+ This model is a fine-tuned version of **Qwen 2.5 Coder Instruct**, specifically trained to automate the generation of Python unit tests.
18
+
19
+ > **Note:** If your specific version of Qwen 2.5 Coder Instruct is a different parameter size (e.g., 1.5B or 32B), make sure to update `Qwen/Qwen2.5-Coder-7B-Instruct` in the YAML header above with the exact Hugging Face path of the base model you used.
20
+
21
+ ## Model Details
22
+
23
+ | Property | Value |
24
+ |------------|-----------------------------------------------------------------------|
25
+ | Base Model | Qwen 2.5 Coder Instruct |
26
+ | Dataset | [KAKA22/CodeRM-UnitTest](https://huggingface.co/datasets/KAKA22/CodeRM-UnitTest) |
27
+ | Language | Python |
28
+ | Format | 16-bit (Safetensors/PyTorch) |
29
+
30
+ ---
31
+
32
+ ## Running Locally as a 4-bit Quantized GGUF
33
+
34
+ Since the default weights are in 16-bit format, you can significantly reduce the memory footprint by converting and quantizing the model to a **4-bit GGUF** format using `llama.cpp`. This makes it much easier to run locally on consumer hardware.
35
+
36
+ ### 1. Clone and Compile llama.cpp
37
+
38
+ Clone the repository and build the tools. You will need a C++ compiler and `make` installed on your system.
39
+
40
+ ```bash
41
+ git clone https://github.com/ggerganov/llama.cpp
42
+ cd llama.cpp
43
+ make
44
+ ```
45
+
46
+ > **Note:** If you are using a GPU, you may want to compile with specific flags (e.g., `make GGML_CUDA=1` for NVIDIA GPUs).
47
+
48
+ Next, install the Python dependencies required for the conversion script:
49
+
50
+ ```bash
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ### 2. Download the 16-bit Model
55
+
56
+ Download the files from this Hugging Face repository to a local folder using `huggingface-cli`. Replace `<YOUR_USERNAME>/<YOUR_MODEL_NAME>` with your actual Hugging Face repository ID:
57
+
58
+ ```bash
59
+ huggingface-cli download <YOUR_USERNAME>/<YOUR_MODEL_NAME> --local-dir ../my-16bit-model
60
+ ```
61
+
62
+ ### 3. Convert to GGUF (FP16)
63
+
64
+ Before quantizing to 4-bit, convert the Hugging Face model format into an unquantized (FP16) GGUF format. Run this from inside the `llama.cpp` directory:
65
+
66
+ ```bash
67
+ python convert_hf_to_gguf.py ../my-16bit-model --outfile ../my-16bit-model/model-fp16.gguf
68
+ ```
69
+
70
+ ### 4. Quantize to 4-bit (Q4_K_M)
71
+
72
+ Use the compiled `llama-quantize` executable to compress the model to a 4-bit format. The `Q4_K_M` method provides a great balance between size and quality.
73
+
74
+ ```bash
75
+ ./llama-quantize ../my-16bit-model/model-fp16.gguf ../my-16bit-model/model-q4_k_m.gguf Q4_K_M
76
+ ```
77
+
78
+ You can now use `model-q4_k_m.gguf` with any standard GGUF runner like **Ollama**, **LM Studio**, or the **llama.cpp server**!