feat: add root config.json (HF download-counter query file) + README note

e0c7a77 verified 9 days ago

2.23 kB

	{
	"_comment": "Ideogram-4 per-layer precision map (mirrors klein-style configs/flux2-klein-.json). Heavy block GEMMs (attention + FFN) are 4-bit (AUTO_4 → INT4 on SM89, FP4 on SM120); the sensitive non-block PROJECTION GEMMs are 8-bit AUTO_8 ('8' → FP8 on SM89+, INT8 on older; W8A8) — MEASURED on the dual-transformer 24GB to fit (cuda_overhead 399 MB) and render a coherent prompt-matching image, with sharper detail than FP16-non-block. Both 8-bit variants verified coherent on SM89 (i8 dashboard-run + f8 CLI-run, each cuda_overhead 399 MB); AUTO_8 picks FP8 on SM89 (native FP8 tensor cores, better dynamic range than INT8 for these sensitive projections). Only the adaln_modulation GEMVs (block + final) stay FP16 (16): they are M=1 GEMVs whose per-token activation quantization is too lossy (conditioning collapse; mirrors ZImage's ZImageBlockPrecision.modulation=FP16 default), and the engine's quantizeWeights deliberately skips them. Keys match the layer-path prefixes used by Ideogram4TransformerLighting::quantizeBlockWeights + Ideogram4Block::quantizeWeights (the numeric block index is skipped by PrecisionMap::matchesPrefix, so 'layers.attention.qkv' matches layers.0..33). Per entry: (1) 'layers.attention.qkv/o' → self-attention projections, large K/N, quant-robust → 4; (2) 'layers.feed_forward.w1/w2/w3' → SwiGLU MLP, largest matrices, primary memory target → 4; (3) 'layers.adaln_modulation' → adaLN modulation GEMV (M=1), NOT called by quantizeWeights → 16; (4) 'input_proj'/(5)'llm_cond_proj'/(6)'t_embedding.'/(7)'adaln_proj'/(8)'final_layer.linear' → non-block projection GEMMs → AUTO_8 ('8': FP8 on SM89+, INT8 older); (9) 'final_layer.adaln_modulation' → final conditioning GEMV → 16. Net: 170 block GEMMs at 4-bit, 5 non-block projection GEMMs at AUTO_8 (FP8 on SM89), 2 adaln-modulation GEMVs at FP16.",
	"layers.attention.qkv": 4,
	"layers.attention.o": 4,
	"layers.feed_forward.w1": 4,
	"layers.feed_forward.w2": 4,
	"layers.feed_forward.w3": 4,
	"layers.adaln_modulation": 16,
	"input_proj": 8,
	"llm_cond_proj": 8,
	"t_embedding.mlp_in": 8,
	"t_embedding.mlp_out": 8,
	"adaln_proj": 8,
	"final_layer.linear": 8,
	"final_layer.adaln_modulation": 16
	}