GPT-4o Output Token Predictor

Predicts the number of output tokens GPT-4o will generate for a given prompt, enabling accurate cost estimation before API calls.

Model Details

  • Architecture: DistilBERT encoder + 3-layer MLP prediction head
  • Training Data: 30,000 ShareGPT-X conversations
  • Performance: MAE 268 tokens | MAPE 15.2%
  • Inference: ~5ms on CPU

Usage

from huggingface_hub import hf_hub_download
import torch

# Download model
model_path = hf_hub_download(
    repo_id="gurpreets64/gpt4o-output-token-predictor",
    filename="best_model.pt"
)

# Load and use
checkpoint = torch.load(model_path, map_location="cpu")
# See full code at: github.com/gurpreeet-singh/llm-output-token-prediction

Links

Citation

@software{gpt4o_token_predictor,
  author = {Gurpreet Singh},
  title = {GPT-4o Output Token Predictor},
  year = {2025},
  url = {https://github.com/gurpreeet-singh/llm-output-token-prediction}
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support