sachithra1234
/

UNIT-LLM

Model card Files Files and versions

UNIT-LLM / README.md

sachithra1234's picture

Update README.md

846ce2b verified 2 months ago

|

history blame contribute delete

2.98 kB

	---
	license: gpl-3.0
	language:
	- code
	datasets:
	- KAKA22/CodeRM-UnitTest
	base_model:
	- Qwen/Qwen2.5-Coder-7B-Instruct
	tags:
	- code
	- unit-testing
	- qwen
	---

	# Qwen 2.5 Coder Instruct - Python Unit Test Fine-tune

	This model is a fine-tuned version of Qwen 2.5 Coder Instruct, specifically trained to automate the generation of Python unit tests.

	> Note: If your specific version of Qwen 2.5 Coder Instruct is a different parameter size (e.g., 1.5B or 32B), make sure to update `Qwen/Qwen2.5-Coder-7B-Instruct` in the YAML header above with the exact Hugging Face path of the base model you used.

	## Model Details

	\| Property \| Value \|
	\|------------\|-----------------------------------------------------------------------\|
	\| Base Model \| Qwen 2.5 Coder Instruct \|
	\| Dataset \| [KAKA22/CodeRM-UnitTest](https://huggingface.co/datasets/KAKA22/CodeRM-UnitTest) \|
	\| Language \| Python \|
	\| Format \| 16-bit (Safetensors/PyTorch) \|

	---

	## Running Locally as a 4-bit Quantized GGUF

	Since the default weights are in 16-bit format, you can significantly reduce the memory footprint by converting and quantizing the model to a 4-bit GGUF format using `llama.cpp`. This makes it much easier to run locally on consumer hardware.

	### 1. Clone and Compile llama.cpp

	Clone the repository and build the tools. You will need a C++ compiler and `make` installed on your system.

	```bash
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp
	make
	```

	> Note: If you are using a GPU, you may want to compile with specific flags (e.g., `make GGML_CUDA=1` for NVIDIA GPUs).

	Next, install the Python dependencies required for the conversion script:

	```bash
	pip install -r requirements.txt
	```

	### 2. Download the 16-bit Model

	Download the files from this Hugging Face repository to a local folder using `huggingface-cli`. Replace `<YOUR_USERNAME>/<YOUR_MODEL_NAME>` with your actual Hugging Face repository ID:

	```bash
	huggingface-cli download <YOUR_USERNAME>/<YOUR_MODEL_NAME> --local-dir ../my-16bit-model
	```

	### 3. Convert to GGUF (FP16)

	Before quantizing to 4-bit, convert the Hugging Face model format into an unquantized (FP16) GGUF format. Run this from inside the `llama.cpp` directory:

	```bash
	python convert_hf_to_gguf.py ../my-16bit-model --outfile ../my-16bit-model/model-fp16.gguf
	```

	### 4. Quantize to 4-bit (Q4_K_M)

	Use the compiled `llama-quantize` executable to compress the model to a 4-bit format. The `Q4_K_M` method provides a great balance between size and quality.

	```bash
	./llama-quantize ../my-16bit-model/model-fp16.gguf ../my-16bit-model/model-q4_k_m.gguf Q4_K_M
	```

	You can now use `model-q4_k_m.gguf` with any standard GGUF runner like Ollama, LM Studio, or the llama.cpp server!