# Known limitations — tritllm-codec Items previously raised in code review have been addressed in the current release. This document only lists deliberate design tradeoffs that the codec review surfaced, not bugs. ## Design tradeoffs ### Scale codebook upper bound = `max(group_abs_maxes)` **Where:** [`quantize_model_v2.py`, `trit_quantize_scales`, `log_max = np.max(...)`](quantize_model_v2.py#L107) The 27-entry log-spaced scale codebook spans `[log_min, log_max]` where `log_max` is taken to be the maximum group magnitude in the matrix. This is intentional — an earlier 99.9th-percentile bound (commit prior to `0c16d24`) clipped large-scale outlier groups and lost their resolution. The downside: a single extreme-scale outlier group can stretch the log-spaced range and reduce scale resolution for the bulk of normal-magnitude groups in the same matrix. We do not see this cause measurable quality regressions on Qwen2.5, Llama-3.1, or Mistral-7B. If you observe unexpectedly high PPL on a new model family with heavy-tailed scale distributions, this is the first place to look. We did not change this in the current release because changing it would alter the bit-exact output of the codec and invalidate published paper numbers; a future v3 may replace `np.max` with a soft-cap (e.g. `min(max, 4 * p99)`) that is robust to single extreme outliers without giving up large-scale fidelity. ### Scale candidate set is fixed at 4 percentiles **Where:** [`quantize_model_v2.py`, `compute_best_scale_4cand`](quantize_model_v2.py#L75) The MSE-best scale is selected from four fixed order statistics — indices `[gs-6, gs-4, gs-2, gs-1]` of sorted `|w|`. This is a deliberate compute / quality tradeoff (≈50× speedup over an exhaustive sweep, <1% PPL gap measured on Qwen2.5-7B), not a bug. The function name and docstring now reflect this.