Spaces:
Running
Running
Add renderers vs TITO comparison section
#1
by kashif HF Staff - opened
Adds a new "Two ways to keep tokens identical" section between §6 (Prefix preservation)
and §7 (History rewriting), framing the renderer approach and the TITO approach as two
coherent design choices with their own trade-offs.
New figures (in the same hand-coded tito_fig* style as existing ones):
tito_fig6_dataflow.html— runtime side-by-side of one turn-to-turn step under each design.tito_fig5_two_approaches.html— per-model engineering surface each approach asks you
to write when a new family ships.tito_fig7_throughput.html— fragmentation cost curves vs turn count by per-boundary
break rate, with two empirical measurements overlaid (Qwen2.5 on thep=0floor, Qwen3
stock on thep=100%ceiling).
Empirical verification (scripts/verify_renderers.py):
Reproducible script (uses the public renderers library + transformers) that confirms:
bridge_to_next_turnreturns ids that byte-for-byte extend the prior sampled prefix on
every probe (Qwen3 renderer,Qwen/Qwen3-0.6B).- On stock Qwen3-0.6B, MITO and renderer-TITO diverge on 100/100 multi-turn rollouts and
the §5 dummy-diff trick fails on the first turn — exactly what §6's audit would flag. - On
Qwen/Qwen2.5-0.5B-Instruct(where the §6 property test passes), dummy-diff TITO
matches MITO on 100/100 rollouts. - Multiplying the 100% break rate on stock Qwen3 by the worst-case fragmentation factor
for a 5-turn rollout ((N+1)/2 = 3.0×) lands on the library's ">3× throughput" framing
exactly.
The section deliberately does not name people or organisations and grounds claims about
the renderer approach in what the library's API and code actually do.
qgallouedec changed pull request status to merged