GPT-4o Output Token Predictor
Predicts the number of output tokens GPT-4o will generate for a given prompt, enabling accurate cost estimation before API calls.
Model Details
- Architecture: DistilBERT encoder + 3-layer MLP prediction head
- Training Data: 30,000 ShareGPT-X conversations
- Performance: MAE 268 tokens | MAPE 15.2%
- Inference: ~5ms on CPU
Usage
from huggingface_hub import hf_hub_download
import torch
# Download model
model_path = hf_hub_download(
repo_id="gurpreets64/gpt4o-output-token-predictor",
filename="best_model.pt"
)
# Load and use
checkpoint = torch.load(model_path, map_location="cpu")
# See full code at: github.com/gurpreeet-singh/llm-output-token-prediction
Links
- GitHub: gurpreeet-singh/llm-output-token-prediction
- Documentation: See GitHub repo for full training and inference code
Citation
@software{gpt4o_token_predictor,
author = {Gurpreet Singh},
title = {GPT-4o Output Token Predictor},
year = {2025},
url = {https://github.com/gurpreeet-singh/llm-output-token-prediction}
}
- Downloads last month
- 3