HN Success Predictor

A fine-tuned RoBERTa model that predicts the probability of an article title achieving โ‰ฅ100 points on Hacker News.

Model Description

This model was developed as part of an RSS reader project to prioritize content that would resonate with the Hacker News community. It analyzes only the title of an article and outputs a calibrated probability.

Key Features

  • Architecture: RoBERTa-base with increased regularization
  • Calibration: Isotonic regression for reliable probability estimates
  • Temporal Split: Trained on historical data, tested on future data (no data leakage)

Performance

Metric Value
Test ROC AUC 0.685
95% CI [0.675, 0.695]
ECE (Calibration Error) 0.043
Optimal Threshold 0.302

Lift Analysis

Top K% Lift Precision
1% 2.4x 79%
5% 2.1x 69%
10% 1.9x 62%

Training Data

  • Source: HN Algolia API
  • Time Range: 36 months (2022-2025)
  • Total Posts: ~90,000
  • Classes: Hit (โ‰ฅ100 points), Miss (<100 points)
  • Split: 70% train / 15% val / 15% test (temporal)

Usage

With Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import joblib

# Load model
model = AutoModelForSequenceClassification.from_pretrained("philippdubach/hn-success-predictor", subfolder="roberta")
tokenizer = AutoTokenizer.from_pretrained("philippdubach/hn-success-predictor", subfolder="roberta")
calibrator = joblib.load("isotonic_calibrator.joblib")  # Download separately

# Predict
title = "Show HN: I built a neural network in Rust"
inputs = tokenizer(title, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    outputs = model(**inputs)
    raw_prob = torch.softmax(outputs.logits, dim=-1)[0, 1].item()

calibrated_prob = calibrator.predict([raw_prob])[0]
print(f"Probability: {calibrated_prob:.1%}")

With the RSS Reader

# Clone the RSS reader repo
git clone https://github.com/philippdubach/rss-reader
cd rss-reader

# Download model
python -c "from huggingface_hub import snapshot_download; snapshot_download('philippdubach/hn-success-predictor', local_dir='rss_reader/models/hn_model_v7')"

# Use
python main.py refresh
python main.py dashboard --open

Model Architecture

Input Title
    โ†“
RoBERTa-base (regularized)
  - dropout: 0.2
  - weight_decay: 0.05
  - frozen layers: 0-5
    โ†“
Classification Head (2 classes)
    โ†“
Softmax โ†’ Raw Probability
    โ†“
Isotonic Calibration
    โ†“
Calibrated Probability (0.0 - 1.0)

Files

  • roberta/ - Fine-tuned RoBERTa model (safetensors format)
  • isotonic_calibrator.joblib - Probability calibrator
  • config.json - Model metadata and training config

Limitations

  • Title-only: Does not consider article content, domain reputation, or timing
  • HN-specific: Trained on HN data; may not generalize to other platforms
  • Temporal drift: HN culture evolves; periodic retraining recommended
  • Noise floor: Many factors beyond title affect success (timing, submitter, luck)

Version History

Version AUC Notes
V1 0.654 DistilBERT baseline
V3.3 0.714* *Inflated due to random split (data leakage)
V6 0.693 Temporal split, ensemble
V7 0.685 Regularized RoBERTa-only, 61% less overfitting

Citation

@misc{hn-success-predictor,
  author = {Philipp Dubach},
  title = {HN Success Predictor: Predicting Hacker News Virality},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/philippdubach/hn-success-predictor}
}

License

MIT

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support