| { | |
| "_comment": "Ideogram-4 per-layer precision map (mirrors klein-style configs/flux2-klein-*.json). Heavy block GEMMs (attention + FFN) are 4-bit (AUTO_4 β INT4 on SM89, FP4 on SM120); the sensitive non-block PROJECTION GEMMs are 8-bit AUTO_8 ('8' β FP8 on SM89+, INT8 on older; W8A8) β MEASURED on the dual-transformer 24GB to fit (cuda_overhead 399 MB) and render a coherent prompt-matching image, with sharper detail than FP16-non-block. Both 8-bit variants verified coherent on SM89 (i8 dashboard-run + f8 CLI-run, each cuda_overhead 399 MB); AUTO_8 picks FP8 on SM89 (native FP8 tensor cores, better dynamic range than INT8 for these sensitive projections). Only the adaln_modulation GEMVs (block + final) stay FP16 (16): they are M=1 GEMVs whose per-token activation quantization is too lossy (conditioning collapse; mirrors ZImage's ZImageBlockPrecision.modulation=FP16 default), and the engine's quantizeWeights deliberately skips them. Keys match the layer-path prefixes used by Ideogram4TransformerLighting::quantizeBlockWeights + Ideogram4Block::quantizeWeights (the numeric block index is skipped by PrecisionMap::matchesPrefix, so 'layers.attention.qkv' matches layers.0..33). Per entry: (1) 'layers.attention.qkv/o' β self-attention projections, large K/N, quant-robust β 4; (2) 'layers.feed_forward.w1/w2/w3' β SwiGLU MLP, largest matrices, primary memory target β 4; (3) 'layers.adaln_modulation' β adaLN modulation GEMV (M=1), NOT called by quantizeWeights β 16; (4) 'input_proj'/(5)'llm_cond_proj'/(6)'t_embedding.*'/(7)'adaln_proj'/(8)'final_layer.linear' β non-block projection GEMMs β AUTO_8 ('8': FP8 on SM89+, INT8 older); (9) 'final_layer.adaln_modulation' β final conditioning GEMV β 16. Net: 170 block GEMMs at 4-bit, 5 non-block projection GEMMs at AUTO_8 (FP8 on SM89), 2 adaln-modulation GEMVs at FP16.", | |
| "layers.attention.qkv": 4, | |
| "layers.attention.o": 4, | |
| "layers.feed_forward.w1": 4, | |
| "layers.feed_forward.w2": 4, | |
| "layers.feed_forward.w3": 4, | |
| "layers.adaln_modulation": 16, | |
| "input_proj": 8, | |
| "llm_cond_proj": 8, | |
| "t_embedding.mlp_in": 8, | |
| "t_embedding.mlp_out": 8, | |
| "adaln_proj": 8, | |
| "final_layer.linear": 8, | |
| "final_layer.adaln_modulation": 16 | |
| } | |