News Topic Classifier (BCE loss)

Fine-tuned bert-base-uncased for multi-label news topic classification using Binary Cross-Entropy loss.

Model details


Base model	`bert-base-uncased`
Task	Multi-label classification
Loss	Binary Cross-Entropy (BCE)
Number of labels	126 topic codes
Max input length	256 tokens (title + body)
Best val micro-F1	~0.890

Usage

from transformers import BertForSequenceClassification, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("chiunhau/news-classifier-1")
model     = BertForSequenceClassification.from_pretrained("chiunhau/news-classifier-1")
model.eval()

title = "Finland wins gold in ice hockey."
text  = "The Finnish national team claimed victory in the final."
inputs = tokenizer(title, text, return_tensors="pt",
                   truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits

threshold = 0.5
probs     = logits.sigmoid().squeeze()
predicted = [model.config.id2label[i]
             for i, p in enumerate(probs) if p > threshold]
print("Topics:", predicted)

Notes

Input: concatenated title + text fields, separated by the tokenizer's [SEP] token.
Output: raw logits — apply sigmoid() and threshold at 0.5 for binary predictions.
The companion ASL model (chiunhau/news-classifier-2) uses Asymmetric Loss and may be better at detecting rare topics.

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32