jedisct1 commited on
Commit
1d3db99
·
verified ·
1 Parent(s): f3e6118

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -75,7 +75,7 @@ The first plain `Q2_K` family candidate was small enough, but it was not reliabl
75
  - embeddings and output tensors stay higher precision because they are important for token identity and exact syntax
76
  - attention tensors are protected because tool-call and code prompts are structure-heavy
77
  - the dense first FFN is protected because early-layer representation quality matters disproportionately after heavy quantization
78
- - MoE down-expert tensors use `Q3_K`, which was a better quality/memory tradeoff than pushing all expert down-projections lower
79
 
80
  That is why this is still a Q2-class build, but not the smallest possible Q2 build.
81
 
 
75
  - embeddings and output tensors stay higher precision because they are important for token identity and exact syntax
76
  - attention tensors are protected because tool-call and code prompts are structure-heavy
77
  - the dense first FFN is protected because early-layer representation quality matters disproportionately after heavy quantization
78
+ - MoE down-expert tensors use `Q3_K`; this was kept from the known-good imatrix recipe rather than isolated as the only required choice
79
 
80
  That is why this is still a Q2-class build, but not the smallest possible Q2 build.
81