News Topic Classifier (BCE loss)

Fine-tuned bert-base-uncased for multi-label news topic classification using Binary Cross-Entropy loss.

Model details

Base model bert-base-uncased
Task Multi-label classification
Loss Binary Cross-Entropy (BCE)
Number of labels 126 topic codes
Max input length 256 tokens (title + body)
Best val micro-F1 ~0.890

Usage

from transformers import BertForSequenceClassification, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("chiunhau/news-classifier-1")
model     = BertForSequenceClassification.from_pretrained("chiunhau/news-classifier-1")
model.eval()

title = "Finland wins gold in ice hockey."
text  = "The Finnish national team claimed victory in the final."
inputs = tokenizer(title, text, return_tensors="pt",
                   truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits

threshold = 0.5
probs     = logits.sigmoid().squeeze()
predicted = [model.config.id2label[i]
             for i, p in enumerate(probs) if p > threshold]
print("Topics:", predicted)

Notes

  • Input: concatenated title + text fields, separated by the tokenizer's [SEP] token.
  • Output: raw logits — apply sigmoid() and threshold at 0.5 for binary predictions.
  • The companion ASL model (chiunhau/news-classifier-2) uses Asymmetric Loss and may be better at detecting rare topics.
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support