HN Success Predictor
A fine-tuned RoBERTa model that predicts the probability of an article title achieving โฅ100 points on Hacker News.
Model Description
This model was developed as part of an RSS reader project to prioritize content that would resonate with the Hacker News community. It analyzes only the title of an article and outputs a calibrated probability.
Key Features
- Architecture: RoBERTa-base with increased regularization
- Calibration: Isotonic regression for reliable probability estimates
- Temporal Split: Trained on historical data, tested on future data (no data leakage)
Performance
| Metric | Value |
|---|---|
| Test ROC AUC | 0.685 |
| 95% CI | [0.675, 0.695] |
| ECE (Calibration Error) | 0.043 |
| Optimal Threshold | 0.302 |
Lift Analysis
| Top K% | Lift | Precision |
|---|---|---|
| 1% | 2.4x | 79% |
| 5% | 2.1x | 69% |
| 10% | 1.9x | 62% |
Training Data
- Source: HN Algolia API
- Time Range: 36 months (2022-2025)
- Total Posts: ~90,000
- Classes: Hit (โฅ100 points), Miss (<100 points)
- Split: 70% train / 15% val / 15% test (temporal)
Usage
With Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import joblib
# Load model
model = AutoModelForSequenceClassification.from_pretrained("philippdubach/hn-success-predictor", subfolder="roberta")
tokenizer = AutoTokenizer.from_pretrained("philippdubach/hn-success-predictor", subfolder="roberta")
calibrator = joblib.load("isotonic_calibrator.joblib") # Download separately
# Predict
title = "Show HN: I built a neural network in Rust"
inputs = tokenizer(title, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
raw_prob = torch.softmax(outputs.logits, dim=-1)[0, 1].item()
calibrated_prob = calibrator.predict([raw_prob])[0]
print(f"Probability: {calibrated_prob:.1%}")
With the RSS Reader
# Clone the RSS reader repo
git clone https://github.com/philippdubach/rss-reader
cd rss-reader
# Download model
python -c "from huggingface_hub import snapshot_download; snapshot_download('philippdubach/hn-success-predictor', local_dir='rss_reader/models/hn_model_v7')"
# Use
python main.py refresh
python main.py dashboard --open
Model Architecture
Input Title
โ
RoBERTa-base (regularized)
- dropout: 0.2
- weight_decay: 0.05
- frozen layers: 0-5
โ
Classification Head (2 classes)
โ
Softmax โ Raw Probability
โ
Isotonic Calibration
โ
Calibrated Probability (0.0 - 1.0)
Files
roberta/- Fine-tuned RoBERTa model (safetensors format)isotonic_calibrator.joblib- Probability calibratorconfig.json- Model metadata and training config
Limitations
- Title-only: Does not consider article content, domain reputation, or timing
- HN-specific: Trained on HN data; may not generalize to other platforms
- Temporal drift: HN culture evolves; periodic retraining recommended
- Noise floor: Many factors beyond title affect success (timing, submitter, luck)
Version History
| Version | AUC | Notes |
|---|---|---|
| V1 | 0.654 | DistilBERT baseline |
| V3.3 | 0.714* | *Inflated due to random split (data leakage) |
| V6 | 0.693 | Temporal split, ensemble |
| V7 | 0.685 | Regularized RoBERTa-only, 61% less overfitting |
Citation
@misc{hn-success-predictor,
author = {Philipp Dubach},
title = {HN Success Predictor: Predicting Hacker News Virality},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/philippdubach/hn-success-predictor}
}
License
MIT
- Downloads last month
- 5