---
library_name: transformers
tags:
- text-classification
- distilbert
- news-classification
- sri-lanka
base_model:
- distilbert/distilbert-base-uncased
---


## Model Details

**Model Name:** `Ginidu2003/Distilbert-Base-News-classifier`  
**Model Type:** Text Classification  
**Base Model:** `distilbert/distilbert-base-uncased`  
**Language(s):** English    
**Finetuned from model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)

### Model Description
This is a fine-tuned DistilBERT model designed to classify English news articles into **5 categories**:

- Business  
- Opinion  
- Political gossip  
- Sports  
- World news  


## Uses

### Direct Use
- Classify news articles into one of the five predefined categories.
- Suitable for  English news (Like Daily Mirror style).

### Downstream Use
- Can be integrated into web applications (Streamlit/Gradio) for automated news categorization.
- Can be used for real-time news filtering and topic-based news recommendation systems.

### Out-of-Scope Use
- Not intended for other languages.
- Not trained for sentiment analysis, fake news detection, or hate speech detection.
- Not suitable for very short texts.

## Bias, Risks, and Limitations
- The model is trained only on **Daily Mirror** news data, so it may perform poorly on other news sources or different writing styles.
- Potential bias towards Sri Lankan context and English used in Sri Lankan media.
- Performance may degrade on very long or very short articles.

## How to Get Started with the Model

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification", 
    model="Ginidu2003/Distilbert-Base-News-classifier"
)

result = classifier("Your news article text here...")
print(result)
```
## Training Details

### Training Data
- **Dataset**: Daily Mirror Sri Lankan English news (2024–2025)
- **Total samples**: ~1,018 articles (after preprocessing and deduplication)
- **Classes**: 5 balanced categories (Business, Opinion, Political gossip, Sports, World news)
- **Preprocessing**: Lowercasing, punctuation removal, lemmatization

### Training Procedure
- **Framework**: Hugging Face Transformers + Trainer API
- **Base Model**: `distilbert/distilbert-base-uncased`
- **Epochs**: 4
- **Batch Size**: 8
- **Learning Rate**: 2e-5
- **Validation Accuracy**: **90.19%**

## Evaluation

**Validation Set Results (20% hold-out):**
- **Accuracy**: **91.18%**
- Model shows strong and consistent performance across all 5 classes.

## Environmental Impact
- Training was done on a single GPU (T4 GPU on Colab)
- Estimated carbon emissions: Very low (small model + small dataset)