Abhiray commited on
Commit
a67e702
·
verified ·
1 Parent(s): bfa8f74

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - gguf
8
+ - qwen
9
+ - qwen3.5
10
+ - code
11
+ - python
12
+ ---
13
+
14
+ # Qwen3.5-4B-Python-Coder-GGUF
15
+
16
+ This repository contains GGUF format quantized weights for the [Jackrong/Qwen3.5-4B-Python-Coder](https://huggingface.co/Jackrong/Qwen3.5-4B-Python-Coder) model.
17
+
18
+ These files were generated using `llama.cpp` to make the model accessible for local CPU and GPU inference across various platforms.
19
+
20
+ ## Available Quantizations
21
+
22
+ The following quantization formats are available in this repository:
23
+
24
+ * **Q3_K_M:** Smallest size, heavily quantized. Good for very low RAM environments, but significant loss in coding accuracy.
25
+ * **Q4_K_M:** Recommended baseline. Excellent balance between file size, memory usage, and coding performance.
26
+ * **Q5_K_M:** Higher accuracy than Q4, slightly larger file size.
27
+ * **Q6_K:** Very close to the original unquantized model's performance. Great if you have the RAM for it.
28
+ * **Q8_0:** Almost zero quality loss compared to the original 16-bit model, but largest file size and highest memory requirement.
29
+
30
+ ## How to Run
31
+
32
+ You can run these models locally using [llama.cpp](https://github.com/ggerganov/llama.cpp) or compatible interfaces like LM Studio, Ollama, or text-generation-webui.
33
+
34
+ **Example using `llama.cpp` in the terminal:**
35
+
36
+ ```bash
37
+ ./main -m Qwen3.5-4B-Python-Coder-Q4_K_M.gguf -n 512 --color -i -cml -p "<|im_start|>user\nWrite a Python script to scrape a website.<|im_end|>\n<|im_start|>assistant\n"