Update README.md
Browse files
README.md
CHANGED
|
@@ -29,9 +29,20 @@ This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct using the
|
|
| 29 |
- **Base model:** meta-llama/Llama-3.2-3B-Instruct
|
| 30 |
- **Fine-tuning method:** LoRA with rank 128
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## Intended Use
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
## Training Details
|
| 37 |
|
|
@@ -99,16 +110,6 @@ outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=
|
|
| 99 |
response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
|
| 100 |
print(response)
|
| 101 |
```
|
| 102 |
-
## GGUF Quantized Versions
|
| 103 |
-
|
| 104 |
-
Quantized GGUF versions are available in the `gguf/` directory for use with llama.cpp:
|
| 105 |
-
|
| 106 |
-
- `Empathetic-Llama-3.2-3B-Instruct-f16.gguf` (6135.6 MB) - 16-bit float (original precision, largest file)
|
| 107 |
-
- `Empathetic-Llama-3.2-3B-Instruct-q3_k_m.gguf` (1609.0 MB) - 3-bit quantization (medium quality)
|
| 108 |
-
- `Empathetic-Llama-3.2-3B-Instruct-q4_k_m.gguf` (1925.8 MB) - 4-bit quantization (medium, recommended for most use cases)
|
| 109 |
-
- `Empathetic-Llama-3.2-3B-Instruct-q5_k_m.gguf` (2214.6 MB) - 5-bit quantization (medium, good quality)
|
| 110 |
-
- `Empathetic-Llama-3.2-3B-Instruct-q6_k.gguf` (2521.4 MB) - 6-bit quantization (high quality)
|
| 111 |
-
- `Empathetic-Llama-3.2-3B-Instruct-q8_0.gguf` (3263.4 MB) - 8-bit quantization (very high quality)
|
| 112 |
|
| 113 |
### Using with llama.cpp
|
| 114 |
|
|
|
|
| 29 |
- **Base model:** meta-llama/Llama-3.2-3B-Instruct
|
| 30 |
- **Fine-tuning method:** LoRA with rank 128
|
| 31 |
|
| 32 |
+
## GGUF Quantized Versions
|
| 33 |
+
|
| 34 |
+
Quantized GGUF versions are available [right here](https://huggingface.co/theprint/Empathetic-Llama-3.2-3B-Instruct-GGUF):
|
| 35 |
+
|
| 36 |
+
- `Empathetic-Llama-3.2-3B-Instruct-f16.gguf` (6135.6 MB) - 16-bit float (original precision, largest file)
|
| 37 |
+
- `Empathetic-Llama-3.2-3B-Instruct-q3_k_m.gguf` (1609.0 MB) - 3-bit quantization (medium quality)
|
| 38 |
+
- `Empathetic-Llama-3.2-3B-Instruct-q4_k_m.gguf` (1925.8 MB) - 4-bit quantization (medium, recommended for most use cases)
|
| 39 |
+
- `Empathetic-Llama-3.2-3B-Instruct-q5_k_m.gguf` (2214.6 MB) - 5-bit quantization (medium, good quality)
|
| 40 |
+
- `Empathetic-Llama-3.2-3B-Instruct-q6_k.gguf` (2521.4 MB) - 6-bit quantization (high quality)
|
| 41 |
+
- `Empathetic-Llama-3.2-3B-Instruct-q8_0.gguf` (3263.4 MB) - 8-bit quantization (very high quality)
|
| 42 |
+
|
| 43 |
## Intended Use
|
| 44 |
|
| 45 |
+
Casual conversation.
|
| 46 |
|
| 47 |
## Training Details
|
| 48 |
|
|
|
|
| 110 |
response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
|
| 111 |
print(response)
|
| 112 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
### Using with llama.cpp
|
| 115 |
|