News Topic Classifier

Model Description

A news headline classifier that categorizes text into three topics: Technology, Sports, and Politics. Built using a linear probing approach with frozen BERT embeddings, achieving 95.24% test accuracy.

Model Details

  • Base Model: bert-base-uncased
  • Model Type: BERT with Linear Classification Head
  • Training Approach: Linear Probing (frozen BERT encoder + trainable classifier)
  • Language: English
  • License: Apache 2.0
  • Parameters: ~110M (frozen) + 2.3K (trainable)

Architecture

Input β†’ Tokenizer β†’ Frozen BERT Encoder β†’ [CLS] Token β†’ Linear Layer β†’ 3 Classes

  • Frozen Layers: All 12 BERT transformer layers
  • Trainable Layers: Single linear classifier (768 β†’ 3)
  • Max Sequence Length: 128 tokens
  • Device: CPU

Performance

Best Results

  • Test Accuracy: 95.24%
  • Best Epoch: 5/10

Training Progress

Epoch Train Loss Train Acc Test Loss Test Acc
1 1.0584 45.24% 0.8724 80.95%
2 0.8449 71.43% 0.7716 90.48%
3 0.6893 90.48% 0.6514 80.95%
4 0.5194 92.86% 0.5837 90.48%
5 0.4683 88.10% 0.5102 95.24% ⭐
6 0.3987 95.24% 0.4573 95.24%
7 0.3671 95.24% 0.4358 95.24%
8 0.3386 97.62% 0.3938 90.48%
9 0.2763 97.62% 0.3787 85.71%
10 0.2724 94.05% 0.3713 90.48%

Classification Report (Final Evaluation - Epoch 10)

      precision    recall  f1-score   support

Technology 0.78 1.00 0.88 7 Sports 1.00 0.67 0.80 6 Politics 1.00 1.00 1.00 8

accuracy 0.90 21 macro avg 0.93 0.89 0.89 21 weightedavg 0.93 0.90 0.90 21

Class-wise Performance:

  • Technology: 78% precision, 100% recall, 88% F1-score
  • Sports: 100% precision, 67% recall, 80% F1-score
  • Politics: Perfect performance (100% across all metrics)

Training Data

  • Dataset Size: 105 samples
  • Split: 84 train / 21 test (80/20)
  • Class Distribution: Balanced (35 samples per class)
  • Classes:
    • 0: Technology
    • 1: Sports
    • 2: Politics

Training Details

Hyperparameters

  • Learning Rate: 2e-3 (higher for linear probing)
  • Optimizer: AdamW (classifier parameters only)
  • Loss Function: CrossEntropyLoss
  • Batch Size: 16
  • Epochs: 10
  • Max Length: 128 tokens
  • Training Time: ~2.3 seconds per batch (CPU)

Training Strategy

Linear probing was chosen to:

  • Leverage pre-trained BERT knowledge
  • Reduce training time and compute requirements
  • Prevent overfitting on small dataset (105 samples)
  • Train only 2.3K parameters instead of 110M

Usage

Loading the Model

import torch from transformers import AutoTokenizer from model import BERTLinearClassifier # Your custom class

Load model model = BERTLinearClassifier(model_name='bert-base-uncased', num_labels=3) checkpoint = torch.load('pytorch_model.pt', map_location='cpu') model.load_state_dict(checkpoint['model_state_dict']) model.eval()

Load tokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

Making Predictions

Single prediction headline = "Apple releases new MacBook Pro with M3 chip"

inputs = tokenizer( headline, return_tensors='pt', max_length=128, padding='max_length', truncation=True )

with torch.no_grad(): logits = model(inputs['input_ids'], inputs['attention_mask']) prediction = torch.argmax(logits, dim=1).item() probabilities = torch.softmax(logits, dim=1)

Map to labels id2label = {0: 'Technology', 1: 'Sports', 2: 'Politics'} print(f"Predicted: {id2label[prediction]}") print(f"Confidence: {probabilities[prediction]:.2%}")

Output: Predicted: Technology Confidence: 94.5%

Batch Predictions

headlines = [ "Google unveils new AI model", "Manchester United wins Premier League", "Senate passes infrastructure bill" ]

for headline in headlines: inputs = tokenizer(headline, return_tensors='pt', max_length=128, padding='max_length', truncation=True)

with torch.no_grad(): logits = model(inputs['input_ids'], inputs['attention_mask']) pred = torch.argmax(logits, dim=1).item()

print(f"{headline} β†’ {id2label[pred]}")

Limitations

  • Small Training Set: Only 105 samples; may not generalize to diverse news sources
  • Short Text Only: Optimized for headlines, not full articles
  • Three Categories: Limited domain coverage
  • English Only: No multilingual support
  • Sports Recall: Lower recall (67%) on sports headlines - may miss some sports content
  • CPU Training: Trained on CPU, so no GPU optimizations

Bias and Ethical Considerations

  • Model may reflect biases in training data
  • Limited to three broad categories; many news topics won't fit
  • Should not be used for content moderation without human review
  • Performance may vary on news from different time periods or regions

Intended Use

βœ… Recommended Use Cases

  • Educational demonstration of linear probing technique
  • Portfolio project showcase
  • Prototyping news classification pipelines
  • Research on transfer learning with limited data

❌ Not Recommended For

  • Production news classification systems (needs more data)
  • Multi-label classification (politics + technology articles)
  • Non-English content
  • Long-form article classification
  • High-stakes automated decision making

Future Improvements

  • Expand dataset size (1000+ samples minimum)
  • Add more categories (Business, Entertainment, Health, etc.)
  • Fine-tune BERT layers for better performance
  • Collect real-world news data from multiple sources
  • Implement confidence thresholds for uncertain predictions

Model Files

  • pytorch_model.pt: Model checkpoint with state dict
  • config.json: Model configuration and label mappings
  • tokenizer files: BERT tokenizer (from bert-base-uncased)

Acknowledgments

Contact


Note: This model was trained as a learning project. For production use, consider models trained on larger, more diverse datasets.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for karthik-infobell25/news-topic-classifier

Finetuned
(6389)
this model

Space using karthik-infobell25/news-topic-classifier 1