News Topic Classifier (BCE loss)
Fine-tuned bert-base-uncased for
multi-label news topic classification using Binary Cross-Entropy loss.
Model details
| Base model | bert-base-uncased |
| Task | Multi-label classification |
| Loss | Binary Cross-Entropy (BCE) |
| Number of labels | 126 topic codes |
| Max input length | 256 tokens (title + body) |
| Best val micro-F1 | ~0.890 |
Usage
from transformers import BertForSequenceClassification, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("chiunhau/news-classifier-1")
model = BertForSequenceClassification.from_pretrained("chiunhau/news-classifier-1")
model.eval()
title = "Finland wins gold in ice hockey."
text = "The Finnish national team claimed victory in the final."
inputs = tokenizer(title, text, return_tensors="pt",
truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
threshold = 0.5
probs = logits.sigmoid().squeeze()
predicted = [model.config.id2label[i]
for i, p in enumerate(probs) if p > threshold]
print("Topics:", predicted)
Notes
- Input: concatenated
title+textfields, separated by the tokenizer's[SEP]token. - Output: raw logits — apply
sigmoid()and threshold at 0.5 for binary predictions. - The companion ASL model (
chiunhau/news-classifier-2) uses Asymmetric Loss and may be better at detecting rare topics.
- Downloads last month
- 1