Papers
arxiv:2506.06446

Tokenization Multiplicity Leads to Arbitrary Price Variation in LLM-as-a-service

Published on Jun 6, 2025
Authors:
,
,
,

Abstract

LLM-as-a-service pricing varies unexpectedly due to multiple tokenizations of the same output string, which can be addressed through canonical generation that restricts tokenization to training-time unique sequences.

AI-generated summary

Providers of LLM-as-a-service have predominantly adopted a simple pricing model: users pay a fixed price per token. Consequently, one may think that the price two different users would pay for the same output string under the same input prompt is the same. In our work, we show that, surprisingly, this is not (always) true. We find empirical evidence that, particularly for non-english outputs, both proprietary and open-weights LLMs often generate the same (output) string with multiple different tokenizations, even under the same input prompt, and this in turn leads to arbitrary price variation. To address the problem of tokenization multiplicity, we introduce canonical generation, a type of constrained generation that restricts LLMs to only generate canonical tokenizations -- the unique tokenization in which each string is tokenized during the training process of an LLM. Further, we introduce an efficient sampling algorithm for canonical generation based on the Gumbel-Max trick. Experiments on a variety of natural language tasks demonstrate that our sampling algorithm for canonical generation is comparable to standard sampling in terms of performance and runtime, and it solves the problem of tokenization multiplicity.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.06446 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.06446 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.