theprint
/

Empathetic-Llama-3.2-3B-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions

theprint commited on Jul 30, 2025

Commit

f048ae2

·

verified ·

1 Parent(s): 5b219da

Update README.md

Files changed (1) hide show

README.md +12 -11

README.md CHANGED Viewed

@@ -29,9 +29,20 @@ This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct using the
 - **Base model:** meta-llama/Llama-3.2-3B-Instruct
 - **Fine-tuning method:** LoRA with rank 128
 ## Intended Use
-Python code assistance.
 ## Training Details
@@ -99,16 +110,6 @@ outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=
 response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
 print(response)
 ```
-## GGUF Quantized Versions
-Quantized GGUF versions are available in the `gguf/` directory for use with llama.cpp:
-- `Empathetic-Llama-3.2-3B-Instruct-f16.gguf` (6135.6 MB) - 16-bit float (original precision, largest file)
-- `Empathetic-Llama-3.2-3B-Instruct-q3_k_m.gguf` (1609.0 MB) - 3-bit quantization (medium quality)
-- `Empathetic-Llama-3.2-3B-Instruct-q4_k_m.gguf` (1925.8 MB) - 4-bit quantization (medium, recommended for most use cases)
-- `Empathetic-Llama-3.2-3B-Instruct-q5_k_m.gguf` (2214.6 MB) - 5-bit quantization (medium, good quality)
-- `Empathetic-Llama-3.2-3B-Instruct-q6_k.gguf` (2521.4 MB) - 6-bit quantization (high quality)
-- `Empathetic-Llama-3.2-3B-Instruct-q8_0.gguf` (3263.4 MB) - 8-bit quantization (very high quality)
 ### Using with llama.cpp

 - **Base model:** meta-llama/Llama-3.2-3B-Instruct
 - **Fine-tuning method:** LoRA with rank 128
+## GGUF Quantized Versions
+Quantized GGUF versions are available [right here](https://huggingface.co/theprint/Empathetic-Llama-3.2-3B-Instruct-GGUF):
+- `Empathetic-Llama-3.2-3B-Instruct-f16.gguf` (6135.6 MB) - 16-bit float (original precision, largest file)
+- `Empathetic-Llama-3.2-3B-Instruct-q3_k_m.gguf` (1609.0 MB) - 3-bit quantization (medium quality)
+- `Empathetic-Llama-3.2-3B-Instruct-q4_k_m.gguf` (1925.8 MB) - 4-bit quantization (medium, recommended for most use cases)
+- `Empathetic-Llama-3.2-3B-Instruct-q5_k_m.gguf` (2214.6 MB) - 5-bit quantization (medium, good quality)
+- `Empathetic-Llama-3.2-3B-Instruct-q6_k.gguf` (2521.4 MB) - 6-bit quantization (high quality)
+- `Empathetic-Llama-3.2-3B-Instruct-q8_0.gguf` (3263.4 MB) - 8-bit quantization (very high quality)
 ## Intended Use
+Casual conversation.
 ## Training Details
 response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
 print(response)
 ```
 ### Using with llama.cpp