Token Estimate Calculation

Method: Parameter Ratio Rule

Parameters: {params.toFixed(1)}B ({paramsAbs.toLocaleString()})

Ratio Range: {ratioMin}x - {ratioMax}x parameters per token {isMoe && (

MoE (Mixture of Experts) models typically use fewer tokens per parameter due to sparse activation.

)} {!isMoe && (

Standard transformer models typically use 5-30x parameters per token based on published data.

)}

Step-by-Step Calculation:

Min estimate: {paramsAbs.toLocaleString()} params × {ratioMin} = {tokensEstMin ? (tokensEstMin / 1e9).toFixed(1) : '—'}B tokens

Max estimate: {paramsAbs.toLocaleString()} params × {ratioMax} = {tokensEstMax ? (tokensEstMax / 1e9).toFixed(1) : '—'}B tokens

Midpoint: ({tokensEstMin ? (tokensEstMin / 1e9).toFixed(1) : '—'} + {tokensEstMax ? (tokensEstMax / 1e9).toFixed(1) : '—'}) ÷ 2 = {tokensEstMid ? (tokensEstMid / 1e9).toFixed(1) : '—'}B tokens

Note: These estimates are based on parameter-to-token ratios observed in published models (e.g., GPT-3, Chinchilla scaling laws). Actual training data may vary significantly based on data quality, curriculum learning, and other factors.