News Topic Classifier

Model Description

A news headline classifier that categorizes text into three topics: Technology, Sports, and Politics. Built using a linear probing approach with frozen BERT embeddings, achieving 95.24% test accuracy.

Model Details

Base Model: bert-base-uncased
Model Type: BERT with Linear Classification Head
Training Approach: Linear Probing (frozen BERT encoder + trainable classifier)
Language: English
License: Apache 2.0
Parameters: ~110M (frozen) + 2.3K (trainable)

Architecture

Input → Tokenizer → Frozen BERT Encoder → [CLS] Token → Linear Layer → 3 Classes

Frozen Layers: All 12 BERT transformer layers
Trainable Layers: Single linear classifier (768 → 3)
Max Sequence Length: 128 tokens
Device: CPU

Performance

Best Results

Test Accuracy: 95.24%
Best Epoch: 5/10

Training Progress

Epoch	Train Loss	Train Acc	Test Loss	Test Acc
1	1.0584	45.24%	0.8724	80.95%
2	0.8449	71.43%	0.7716	90.48%
3	0.6893	90.48%	0.6514	80.95%
4	0.5194	92.86%	0.5837	90.48%
5	0.4683	88.10%	0.5102	95.24% ⭐
6	0.3987	95.24%	0.4573	95.24%
7	0.3671	95.24%	0.4358	95.24%
8	0.3386	97.62%	0.3938	90.48%
9	0.2763	97.62%	0.3787	85.71%
10	0.2724	94.05%	0.3713	90.48%

Classification Report (Final Evaluation - Epoch 10)

      precision    recall  f1-score   support

Technology 0.78 1.00 0.88 7 Sports 1.00 0.67 0.80 6 Politics 1.00 1.00 1.00 8

accuracy 0.90 21 macro avg 0.93 0.89 0.89 21 weightedavg 0.93 0.90 0.90 21

Class-wise Performance:

Technology: 78% precision, 100% recall, 88% F1-score
Sports: 100% precision, 67% recall, 80% F1-score
Politics: Perfect performance (100% across all metrics)

Training Data

Dataset Size: 105 samples
Split: 84 train / 21 test (80/20)
Class Distribution: Balanced (35 samples per class)
Classes:
- 0: Technology
- 1: Sports
- 2: Politics

Training Details

Hyperparameters

Learning Rate: 2e-3 (higher for linear probing)
Optimizer: AdamW (classifier parameters only)
Loss Function: CrossEntropyLoss
Batch Size: 16
Epochs: 10
Max Length: 128 tokens
Training Time: ~2.3 seconds per batch (CPU)

Training Strategy

Linear probing was chosen to:

Leverage pre-trained BERT knowledge
Reduce training time and compute requirements
Prevent overfitting on small dataset (105 samples)
Train only 2.3K parameters instead of 110M

Usage

Loading the Model

import torch from transformers import AutoTokenizer from model import BERTLinearClassifier # Your custom class

Load model model = BERTLinearClassifier(model_name='bert-base-uncased', num_labels=3) checkpoint = torch.load('pytorch_model.pt', map_location='cpu') model.load_state_dict(checkpoint['model_state_dict']) model.eval()

Load tokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

Making Predictions

Single prediction headline = "Apple releases new MacBook Pro with M3 chip"

inputs = tokenizer( headline, return_tensors='pt', max_length=128, padding='max_length', truncation=True )

with torch.no_grad(): logits = model(inputs['input_ids'], inputs['attention_mask']) prediction = torch.argmax(logits, dim=1).item() probabilities = torch.softmax(logits, dim=1)

Map to labels id2label = {0: 'Technology', 1: 'Sports', 2: 'Politics'} print(f"Predicted: {id2label[prediction]}") print(f"Confidence: {probabilities[prediction]:.2%}")

Output: Predicted: Technology Confidence: 94.5%

Batch Predictions

headlines = [ "Google unveils new AI model", "Manchester United wins Premier League", "Senate passes infrastructure bill" ]

for headline in headlines: inputs = tokenizer(headline, return_tensors='pt', max_length=128, padding='max_length', truncation=True)

with torch.no_grad(): logits = model(inputs['input_ids'], inputs['attention_mask']) pred = torch.argmax(logits, dim=1).item()

print(f"{headline} → {id2label[pred]}")

Limitations

Small Training Set: Only 105 samples; may not generalize to diverse news sources
Short Text Only: Optimized for headlines, not full articles
Three Categories: Limited domain coverage
English Only: No multilingual support
Sports Recall: Lower recall (67%) on sports headlines - may miss some sports content
CPU Training: Trained on CPU, so no GPU optimizations

Bias and Ethical Considerations

Model may reflect biases in training data
Limited to three broad categories; many news topics won't fit
Should not be used for content moderation without human review
Performance may vary on news from different time periods or regions

Intended Use

✅ Recommended Use Cases

Educational demonstration of linear probing technique
Portfolio project showcase
Prototyping news classification pipelines
Research on transfer learning with limited data

❌ Not Recommended For

Production news classification systems (needs more data)
Multi-label classification (politics + technology articles)
Non-English content
Long-form article classification
High-stakes automated decision making

Future Improvements

Expand dataset size (1000+ samples minimum)
Add more categories (Business, Entertainment, Health, etc.)
Fine-tune BERT layers for better performance
Collect real-world news data from multiple sources
Implement confidence thresholds for uncertain predictions

Model Files

pytorch_model.pt: Model checkpoint with state dict
config.json: Model configuration and label mappings
tokenizer files: BERT tokenizer (from bert-base-uncased)

Acknowledgments

Base Model: bert-base-uncased by Google Research
Framework: PyTorch
Transformers Library: Hugging Face Transformers

Contact

Hugging Face: karthik-infobell25

Note: This model was trained as a learning project. For production use, consider models trained on larger, more diverse datasets.

Downloads last month: -

Model tree for karthik-infobell25/news-topic-classifier

Base model

google-bert/bert-base-uncased

Finetuned

(6750)

this model

karthik-infobell25
/

news-topic-classifier