sachithra1234
/

UNIT-LLM

Model card Files Files and versions

sachithra1234 commited on Apr 3

Commit

846ce2b

·

verified ·

1 Parent(s): a80ffaa

Update README.md

Files changed (1) hide show

README.md +78 -3

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: gpl-3.0
----

+---
+license: gpl-3.0
+language:
+  - code
+datasets:
+  - KAKA22/CodeRM-UnitTest
+base_model:
+  - Qwen/Qwen2.5-Coder-7B-Instruct
+tags:
+  - code
+  - unit-testing
+  - qwen
+---
+# Qwen 2.5 Coder Instruct - Python Unit Test Fine-tune
+This model is a fine-tuned version of **Qwen 2.5 Coder Instruct**, specifically trained to automate the generation of Python unit tests.
+> **Note:** If your specific version of Qwen 2.5 Coder Instruct is a different parameter size (e.g., 1.5B or 32B), make sure to update `Qwen/Qwen2.5-Coder-7B-Instruct` in the YAML header above with the exact Hugging Face path of the base model you used.
+## Model Details
+| Property   | Value                                                                 |
+|------------|-----------------------------------------------------------------------|
+| Base Model | Qwen 2.5 Coder Instruct                                               |
+| Dataset    | [KAKA22/CodeRM-UnitTest](https://huggingface.co/datasets/KAKA22/CodeRM-UnitTest) |
+| Language   | Python                                                                |
+| Format     | 16-bit (Safetensors/PyTorch)                                          |
+---
+## Running Locally as a 4-bit Quantized GGUF
+Since the default weights are in 16-bit format, you can significantly reduce the memory footprint by converting and quantizing the model to a **4-bit GGUF** format using `llama.cpp`. This makes it much easier to run locally on consumer hardware.
+### 1. Clone and Compile llama.cpp
+Clone the repository and build the tools. You will need a C++ compiler and `make` installed on your system.
+```bash
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make
+```
+> **Note:** If you are using a GPU, you may want to compile with specific flags (e.g., `make GGML_CUDA=1` for NVIDIA GPUs).
+Next, install the Python dependencies required for the conversion script:
+```bash
+pip install -r requirements.txt
+```
+### 2. Download the 16-bit Model
+Download the files from this Hugging Face repository to a local folder using `huggingface-cli`. Replace `<YOUR_USERNAME>/<YOUR_MODEL_NAME>` with your actual Hugging Face repository ID:
+```bash
+huggingface-cli download <YOUR_USERNAME>/<YOUR_MODEL_NAME> --local-dir ../my-16bit-model
+```
+### 3. Convert to GGUF (FP16)
+Before quantizing to 4-bit, convert the Hugging Face model format into an unquantized (FP16) GGUF format. Run this from inside the `llama.cpp` directory:
+```bash
+python convert_hf_to_gguf.py ../my-16bit-model --outfile ../my-16bit-model/model-fp16.gguf
+```
+### 4. Quantize to 4-bit (Q4_K_M)
+Use the compiled `llama-quantize` executable to compress the model to a 4-bit format. The `Q4_K_M` method provides a great balance between size and quality.
+```bash
+./llama-quantize ../my-16bit-model/model-fp16.gguf ../my-16bit-model/model-q4_k_m.gguf Q4_K_M
+```
+You can now use `model-q4_k_m.gguf` with any standard GGUF runner like **Ollama**, **LM Studio**, or the **llama.cpp server**!