Add renderers vs TITO comparison section

#1
by kashif HF Staff - opened

Adds a new "Two ways to keep tokens identical" section between §6 (Prefix preservation)
and §7 (History rewriting), framing the renderer approach and the TITO approach as two
coherent design choices with their own trade-offs.

New figures (in the same hand-coded tito_fig* style as existing ones):

  • tito_fig6_dataflow.html — runtime side-by-side of one turn-to-turn step under each design.
  • tito_fig5_two_approaches.html — per-model engineering surface each approach asks you
    to write when a new family ships.
  • tito_fig7_throughput.html — fragmentation cost curves vs turn count by per-boundary
    break rate, with two empirical measurements overlaid (Qwen2.5 on the p=0 floor, Qwen3
    stock on the p=100% ceiling).

Empirical verification (scripts/verify_renderers.py):

Reproducible script (uses the public renderers library + transformers) that confirms:

  • bridge_to_next_turn returns ids that byte-for-byte extend the prior sampled prefix on
    every probe (Qwen3 renderer, Qwen/Qwen3-0.6B).
  • On stock Qwen3-0.6B, MITO and renderer-TITO diverge on 100/100 multi-turn rollouts and
    the §5 dummy-diff trick fails on the first turn — exactly what §6's audit would flag.
  • On Qwen/Qwen2.5-0.5B-Instruct (where the §6 property test passes), dummy-diff TITO
    matches MITO on 100/100 rollouts.
  • Multiplying the 100% break rate on stock Qwen3 by the worst-case fragmentation factor
    for a 5-turn rollout ((N+1)/2 = 3.0×) lands on the library's ">3× throughput" framing
    exactly.

The section deliberately does not name people or organisations and grounds claims about
the renderer approach in what the library's API and code actually do.

qgallouedec changed pull request status to merged

Sign up or log in to comment