Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -29,6 +29,20 @@ This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct using the Unsloth
 - **Base model:** Qwen/Qwen2.5-7B-Instruct
 - **Fine-tuning method:** LoRA with rank 128
 ## Intended Use
 Conversation, brainstorming, and general instruction following
@@ -99,16 +113,6 @@ outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=
 response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
 print(response)
 ```
-## GGUF Quantized Versions
-Quantized GGUF versions are available in the `gguf/` directory for use with llama.cpp:
-- `Tom-Qwen-7B-Instruct-f16.gguf` (14531.9 MB) - 16-bit float (original precision, largest file)
-- `Tom-Qwen-7B-Instruct-q3_k_m.gguf` (3632.0 MB) - 3-bit quantization (medium quality)
-- `Tom-Qwen-7B-Instruct-q4_k_m.gguf` (4466.1 MB) - 4-bit quantization (medium, recommended for most use cases)
-- `Tom-Qwen-7B-Instruct-q5_k_m.gguf` (5192.6 MB) - 5-bit quantization (medium, good quality)
-- `Tom-Qwen-7B-Instruct-q6_k.gguf` (5964.5 MB) - 6-bit quantization (high quality)
-- `Tom-Qwen-7B-Instruct-q8_0.gguf` (7723.4 MB) - 8-bit quantization (very high quality)
 ### Using with llama.cpp

 - **Base model:** Qwen/Qwen2.5-7B-Instruct
 - **Fine-tuning method:** LoRA with rank 128
+# GGUF Quantized Versions
+You can find quantized gguf versions of this model here: [theprint/Tom-Qwen-7B-Instruct/tree/main/gguf](https://huggingface.co/theprint/Tom-Qwen-7B-Instruct/tree/main/gguf)
+Quantized GGUF versions are in the `gguf/` directory for use with llama.cpp:
+- `Tom-Qwen-7B-Instruct-f16.gguf` (14531.9 MB) - 16-bit float (original precision, largest file)
+- `Tom-Qwen-7B-Instruct-q3_k_m.gguf` (3632.0 MB) - 3-bit quantization (medium quality)
+- `Tom-Qwen-7B-Instruct-q4_k_m.gguf` (4466.1 MB) - 4-bit quantization (medium, recommended for most use cases)
+- `Tom-Qwen-7B-Instruct-q5_k_m.gguf` (5192.6 MB) - 5-bit quantization (medium, good quality)
+- `Tom-Qwen-7B-Instruct-q6_k.gguf` (5964.5 MB) - 6-bit quantization (high quality)
+- `Tom-Qwen-7B-Instruct-q8_0.gguf` (7723.4 MB) - 8-bit quantization (very high quality)
 ## Intended Use
 Conversation, brainstorming, and general instruction following
 response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
 print(response)
 ```
 ### Using with llama.cpp