Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Paper
•
2406.19146
•
Published
•
1
This repository contains the model checkpoints in the paper "Resolving Discrepancies in Compute-Optimal Scaling of Language Models", by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon.
Each checkpoint directory is in the path
dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}
where dataset, hparams, warmup, decay, params, maxstep are as defined in the github repository, which contains the code and data for reproducing the figures in the paper.
The script evaluating_checkpoint.py allows you to evaluate checkpoints on validation shards and generate text.
Move it to your open_lm local copy and run the following commands:
python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --input-text "The quick brown fox jumps over the lazy dog."
or
python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --val-data "path/to/validation/shards"
@article{porian2024resolving,
title={Resolving Discrepancies in Compute-Optimal Scaling of Language Models},
author={Porian, Tomer and Wortsman, Mitchell and Jitsev, Jenia and Schmidt, Ludwig and Carmon, Yair},
journal={arXiv:2406.19146},
year={2024}
}