Update README.md
Browse files
README.md
CHANGED
|
@@ -101,6 +101,7 @@ I do believe it could be much better, by doing the pruning in stages (say, 4 lay
|
|
| 101 |
*Figure 2: Model size vs average benchmark performance. Llama3-5.4b-instruct may not be fully healed, but its performance scales linearly with its size.*
|
| 102 |
|
| 103 |
## Why 5.4B?
|
|
|
|
| 104 |
This size should allow for:
|
| 105 |
- bf16 inference on 24GB VRAM
|
| 106 |
- Q8 or Q6 inference on 6GB VRAM
|
|
|
|
| 101 |
*Figure 2: Model size vs average benchmark performance. Llama3-5.4b-instruct may not be fully healed, but its performance scales linearly with its size.*
|
| 102 |
|
| 103 |
## Why 5.4B?
|
| 104 |
+
|
| 105 |
This size should allow for:
|
| 106 |
- bf16 inference on 24GB VRAM
|
| 107 |
- Q8 or Q6 inference on 6GB VRAM
|