Text Classification
Transformers
Safetensors
distilbert
news-classification
sri-lanka
text-embeddings-inference
Instructions to use Ginidu2003/Distilbert-Base-News-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ginidu2003/Distilbert-Base-News-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Ginidu2003/Distilbert-Base-News-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Ginidu2003/Distilbert-Base-News-classifier") model = AutoModelForSequenceClassification.from_pretrained("Ginidu2003/Distilbert-Base-News-classifier") - Notebooks
- Google Colab
- Kaggle
File size: 2,686 Bytes
9566bad ee1252c 73bdc8b 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 9566bad ee1252c 974885b 9566bad ee1252c 9566bad ee1252c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | ---
library_name: transformers
tags:
- text-classification
- distilbert
- news-classification
- sri-lanka
base_model:
- distilbert/distilbert-base-uncased
---
## Model Details
**Model Name:** `Ginidu2003/Distilbert-Base-News-classifier`
**Model Type:** Text Classification
**Base Model:** `distilbert/distilbert-base-uncased`
**Language(s):** English
**Finetuned from model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)
### Model Description
This is a fine-tuned DistilBERT model designed to classify English news articles into **5 categories**:
- Business
- Opinion
- Political gossip
- Sports
- World news
## Uses
### Direct Use
- Classify news articles into one of the five predefined categories.
- Suitable for English news (Like Daily Mirror style).
### Downstream Use
- Can be integrated into web applications (Streamlit/Gradio) for automated news categorization.
- Can be used for real-time news filtering and topic-based news recommendation systems.
### Out-of-Scope Use
- Not intended for other languages.
- Not trained for sentiment analysis, fake news detection, or hate speech detection.
- Not suitable for very short texts.
## Bias, Risks, and Limitations
- The model is trained only on **Daily Mirror** news data, so it may perform poorly on other news sources or different writing styles.
- Potential bias towards Sri Lankan context and English used in Sri Lankan media.
- Performance may degrade on very long or very short articles.
## How to Get Started with the Model
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Ginidu2003/Distilbert-Base-News-classifier"
)
result = classifier("Your news article text here...")
print(result)
```
## Training Details
### Training Data
- **Dataset**: Daily Mirror Sri Lankan English news (2024–2025)
- **Total samples**: ~1,018 articles (after preprocessing and deduplication)
- **Classes**: 5 balanced categories (Business, Opinion, Political gossip, Sports, World news)
- **Preprocessing**: Lowercasing, punctuation removal, lemmatization
### Training Procedure
- **Framework**: Hugging Face Transformers + Trainer API
- **Base Model**: `distilbert/distilbert-base-uncased`
- **Epochs**: 4
- **Batch Size**: 8
- **Learning Rate**: 2e-5
- **Validation Accuracy**: **90.19%**
## Evaluation
**Validation Set Results (20% hold-out):**
- **Accuracy**: **91.18%**
- Model shows strong and consistent performance across all 5 classes.
## Environmental Impact
- Training was done on a single GPU (T4 GPU on Colab)
- Estimated carbon emissions: Very low (small model + small dataset) |