theprint commited on
Commit
f048ae2
·
verified ·
1 Parent(s): 5b219da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -29,9 +29,20 @@ This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct using the
29
  - **Base model:** meta-llama/Llama-3.2-3B-Instruct
30
  - **Fine-tuning method:** LoRA with rank 128
31
 
 
 
 
 
 
 
 
 
 
 
 
32
  ## Intended Use
33
 
34
- Python code assistance.
35
 
36
  ## Training Details
37
 
@@ -99,16 +110,6 @@ outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=
99
  response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
100
  print(response)
101
  ```
102
- ## GGUF Quantized Versions
103
-
104
- Quantized GGUF versions are available in the `gguf/` directory for use with llama.cpp:
105
-
106
- - `Empathetic-Llama-3.2-3B-Instruct-f16.gguf` (6135.6 MB) - 16-bit float (original precision, largest file)
107
- - `Empathetic-Llama-3.2-3B-Instruct-q3_k_m.gguf` (1609.0 MB) - 3-bit quantization (medium quality)
108
- - `Empathetic-Llama-3.2-3B-Instruct-q4_k_m.gguf` (1925.8 MB) - 4-bit quantization (medium, recommended for most use cases)
109
- - `Empathetic-Llama-3.2-3B-Instruct-q5_k_m.gguf` (2214.6 MB) - 5-bit quantization (medium, good quality)
110
- - `Empathetic-Llama-3.2-3B-Instruct-q6_k.gguf` (2521.4 MB) - 6-bit quantization (high quality)
111
- - `Empathetic-Llama-3.2-3B-Instruct-q8_0.gguf` (3263.4 MB) - 8-bit quantization (very high quality)
112
 
113
  ### Using with llama.cpp
114
 
 
29
  - **Base model:** meta-llama/Llama-3.2-3B-Instruct
30
  - **Fine-tuning method:** LoRA with rank 128
31
 
32
+ ## GGUF Quantized Versions
33
+
34
+ Quantized GGUF versions are available [right here](https://huggingface.co/theprint/Empathetic-Llama-3.2-3B-Instruct-GGUF):
35
+
36
+ - `Empathetic-Llama-3.2-3B-Instruct-f16.gguf` (6135.6 MB) - 16-bit float (original precision, largest file)
37
+ - `Empathetic-Llama-3.2-3B-Instruct-q3_k_m.gguf` (1609.0 MB) - 3-bit quantization (medium quality)
38
+ - `Empathetic-Llama-3.2-3B-Instruct-q4_k_m.gguf` (1925.8 MB) - 4-bit quantization (medium, recommended for most use cases)
39
+ - `Empathetic-Llama-3.2-3B-Instruct-q5_k_m.gguf` (2214.6 MB) - 5-bit quantization (medium, good quality)
40
+ - `Empathetic-Llama-3.2-3B-Instruct-q6_k.gguf` (2521.4 MB) - 6-bit quantization (high quality)
41
+ - `Empathetic-Llama-3.2-3B-Instruct-q8_0.gguf` (3263.4 MB) - 8-bit quantization (very high quality)
42
+
43
  ## Intended Use
44
 
45
+ Casual conversation.
46
 
47
  ## Training Details
48
 
 
110
  response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
111
  print(response)
112
  ```
 
 
 
 
 
 
 
 
 
 
113
 
114
  ### Using with llama.cpp
115