Method: Parameter Ratio Rule
Parameters: {params.toFixed(1)}B ({paramsAbs.toLocaleString()})
Ratio Range: {ratioMin}x - {ratioMax}x parameters per token
{isMoe && (
MoE (Mixture of Experts) models typically use fewer tokens per parameter due to sparse activation.
)}
{!isMoe && (
Standard transformer models typically use 5-30x parameters per token based on published data.
)}
Step-by-Step Calculation:
Min estimate: {paramsAbs.toLocaleString()} params × {ratioMin} = {tokensEstMin ? (tokensEstMin / 1e9).toFixed(1) : '—'}B tokens
Max estimate: {paramsAbs.toLocaleString()} params × {ratioMax} = {tokensEstMax ? (tokensEstMax / 1e9).toFixed(1) : '—'}B tokens
Midpoint: ({tokensEstMin ? (tokensEstMin / 1e9).toFixed(1) : '—'} + {tokensEstMax ? (tokensEstMax / 1e9).toFixed(1) : '—'}) ÷ 2 = {tokensEstMid ? (tokensEstMid / 1e9).toFixed(1) : '—'}B tokens
Note: These estimates are based on parameter-to-token ratios observed in published models (e.g., GPT-3, Chinchilla scaling laws).
Actual training data may vary significantly based on data quality, curriculum learning, and other factors.