arxiv:2505.06150

A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets

Published on May 9, 2025

Authors:

Abstract

Scaling laws for fine-tuning large language models account for data composition and dataset volume under fixed compute budgets, showing that data composition significantly impacts token efficiency.

AI-generated summary

We introduce a scaling law for fine-tuning large language models (LLMs) under fixed compute budgets that explicitly accounts for data composition. Conventional approaches measure training data solely by total tokens, yet the number of examples and their average token length -- what we term dataset volume -- play a decisive role in model performance. Our formulation is tuned following established procedures. Experiments on the BRICC dataset salavati2024reducing and subsets of the MMLU dataset hendrycks2021measuringmassivemultitasklanguage, evaluated under multiple subsampling strategies, reveal that data composition significantly affects token efficiency. These results motivate refined scaling laws for practical LLM fine-tuning in resource-constrained settings.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2505.06150

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.06150 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.06150 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.06150 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.