HN Success Predictor

A fine-tuned RoBERTa model that predicts the probability of an article title achieving ≥100 points on Hacker News.

Model Description

This model was developed as part of an RSS reader project to prioritize content that would resonate with the Hacker News community. It analyzes only the title of an article and outputs a calibrated probability.

Key Features

Architecture: RoBERTa-base with increased regularization
Calibration: Isotonic regression for reliable probability estimates
Temporal Split: Trained on historical data, tested on future data (no data leakage)

Performance

Metric	Value
Test ROC AUC	0.685
95% CI	[0.675, 0.695]
ECE (Calibration Error)	0.043
Optimal Threshold	0.302

Lift Analysis

Top K%	Lift	Precision
1%	2.4x	79%
5%	2.1x	69%
10%	1.9x	62%

Training Data

Source: HN Algolia API
Time Range: 36 months (2022-2025)
Total Posts: ~90,000
Classes: Hit (≥100 points), Miss (<100 points)
Split: 70% train / 15% val / 15% test (temporal)

Usage

With Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import joblib

# Load model
model = AutoModelForSequenceClassification.from_pretrained("philippdubach/hn-success-predictor", subfolder="roberta")
tokenizer = AutoTokenizer.from_pretrained("philippdubach/hn-success-predictor", subfolder="roberta")
calibrator = joblib.load("isotonic_calibrator.joblib")  # Download separately

# Predict
title = "Show HN: I built a neural network in Rust"
inputs = tokenizer(title, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    outputs = model(**inputs)
    raw_prob = torch.softmax(outputs.logits, dim=-1)[0, 1].item()

calibrated_prob = calibrator.predict([raw_prob])[0]
print(f"Probability: {calibrated_prob:.1%}")

With the RSS Reader

# Clone the RSS reader repo
git clone https://github.com/philippdubach/rss-reader
cd rss-reader

# Download model
python -c "from huggingface_hub import snapshot_download; snapshot_download('philippdubach/hn-success-predictor', local_dir='rss_reader/models/hn_model_v7')"

# Use
python main.py refresh
python main.py dashboard --open

Model Architecture

Input Title
    ↓
RoBERTa-base (regularized)
  - dropout: 0.2
  - weight_decay: 0.05
  - frozen layers: 0-5
    ↓
Classification Head (2 classes)
    ↓
Softmax → Raw Probability
    ↓
Isotonic Calibration
    ↓
Calibrated Probability (0.0 - 1.0)

Files

roberta/ - Fine-tuned RoBERTa model (safetensors format)
isotonic_calibrator.joblib - Probability calibrator
config.json - Model metadata and training config

Limitations

Title-only: Does not consider article content, domain reputation, or timing
HN-specific: Trained on HN data; may not generalize to other platforms
Temporal drift: HN culture evolves; periodic retraining recommended
Noise floor: Many factors beyond title affect success (timing, submitter, luck)

Version History

Version	AUC	Notes
V1	0.654	DistilBERT baseline
V3.3	0.714*	*Inflated due to random split (data leakage)
V6	0.693	Temporal split, ensemble
V7	0.685	Regularized RoBERTa-only, 61% less overfitting

Citation

@misc{hn-success-predictor,
  author = {Philipp Dubach},
  title = {HN Success Predictor: Predicting Hacker News Virality},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/philippdubach/hn-success-predictor}
}

License

MIT

Downloads last month: 7