asmo batch

#1
by Naphula - opened
DarkArtsForge org
edited Feb 2

running both of these in parallel (q6_k and q8_k_xl manually converted)

llama-quantize input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q2_K.gguf Q2_K
llama-quantize input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q3_K_M.gguf Q3_K_M
llama-quantize input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q4_K_M.gguf Q4_K_M
llama-quantize input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q5_K_M.gguf Q5_K_M
llama-quantize input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q5_1.gguf Q5_1
llama-imatrix -m input.gguf -f illuminati_imatrix_v1.txt -o asmodeus-24b-v1_illuminati_imatrix_v1.dat
llama-quantize --imatrix asmodeus-24b-v1_illuminati_imatrix_v1.dat input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ4_NL.gguf IQ4_NL
llama-quantize --imatrix asmodeus-24b-v1_illuminati_imatrix_v1.dat input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ4_XS.gguf IQ4_XS
llama-quantize --imatrix asmodeus-24b-v1_illuminati_imatrix_v1.dat input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ3_M.gguf IQ3_M
llama-quantize --imatrix asmodeus-24b-v1_illuminati_imatrix_v1.dat input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ2_M.gguf IQ2_M
llama-quantize --imatrix asmodeus-24b-v1_illuminati_imatrix_v1.dat input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ1_M.gguf IQ1_M
llama-quantize --imatrix asmodeus-24b-v1_illuminati_imatrix_v1.dat input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ1_S.gguf IQ1_S
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q6_K.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q8_K_XL.gguf
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00001-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00002-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00003-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00004-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00005-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00006-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00007-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00008-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00009-of-00010.safetensors
hf upload DarkArtsForge/Asmodeus-24B-v1 B:\Asmodeus-24B-v1\model-00010-of-00010.safetensors
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q2_K.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q3_K_M.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q4_K_M.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q5_K_M.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q5_1.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ4_NL.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ4_XS.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ3_M.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ2_M.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ1_M.gguf
hf upload Naphula/Asmodeus-24B-v1-GGUF B:\Asmodeus-24B-v1\Asmodeus-24B-v1-IQ1_S.gguf

This is the 8 xl command (unsloth method)
llama-quantize --tensor-type output.weight=F16 --tensor-type token_embd.weight=F16 --tensor-type "blk\.(0|1|2|38|39)\.attn_k.weight=F16" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39)\.attn_output.weight=Q8_0" --tensor-type "blk\.(0|1|2|38|39)\.attn_q.weight=F16" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39)\.attn_v.weight=F16" --tensor-type "blk\.(0|1|2|38|39)\.ffn_down.weight=F16" --tensor-type "blk\.(0|1|2|38|39)\.ffn_gate.weight=F16" --tensor-type "blk\.(0|1|2|38|39)\.ffn_up.weight=F16" --tensor-type "blk\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37)\.attn_k.weight=Q8_0" --tensor-type "blk\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37)\.attn_q.weight=Q8_0" --tensor-type "blk\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37)\.ffn_down.weight=Q8_0" --tensor-type "blk\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37)\.ffn_gate.weight=Q8_0" --tensor-type "blk\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37)\.ffn_up.weight=Q8_0" C:\Quanter\llama.cpp\input.gguf B:\Asmodeus-24B-v1\Asmodeus-24B-v1-Q8_K_XL.gguf Q8_0

Naphula changed discussion status to closed

Sign up or log in to comment