Text Classification
Transformers
Safetensors
distilbert
news-classification
sri-lanka
text-embeddings-inference
Instructions to use Ginidu2003/Distilbert-Base-News-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ginidu2003/Distilbert-Base-News-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Ginidu2003/Distilbert-Base-News-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Ginidu2003/Distilbert-Base-News-classifier") model = AutoModelForSequenceClassification.from_pretrained("Ginidu2003/Distilbert-Base-News-classifier") - Notebooks
- Google Colab
- Kaggle
Model Details
Model Name: Ginidu2003/Distilbert-Base-News-classifier
Model Type: Text Classification
Base Model: distilbert/distilbert-base-uncased
Language(s): English
Finetuned from model: distilbert/distilbert-base-uncased
Model Description
This is a fine-tuned DistilBERT model designed to classify English news articles into 5 categories:
- Business
- Opinion
- Political gossip
- Sports
- World news
Uses
Direct Use
- Classify news articles into one of the five predefined categories.
- Suitable for English news (Like Daily Mirror style).
Downstream Use
- Can be integrated into web applications (Streamlit/Gradio) for automated news categorization.
- Can be used for real-time news filtering and topic-based news recommendation systems.
Out-of-Scope Use
- Not intended for other languages.
- Not trained for sentiment analysis, fake news detection, or hate speech detection.
- Not suitable for very short texts.
Bias, Risks, and Limitations
- The model is trained only on Daily Mirror news data, so it may perform poorly on other news sources or different writing styles.
- Potential bias towards Sri Lankan context and English used in Sri Lankan media.
- Performance may degrade on very long or very short articles.
How to Get Started with the Model
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Ginidu2003/Distilbert-Base-News-classifier"
)
result = classifier("Your news article text here...")
print(result)
Training Details
Training Data
- Dataset: Daily Mirror Sri Lankan English news (2024โ2025)
- Total samples: ~1,018 articles (after preprocessing and deduplication)
- Classes: 5 balanced categories (Business, Opinion, Political gossip, Sports, World news)
- Preprocessing: Lowercasing, punctuation removal, lemmatization
Training Procedure
- Framework: Hugging Face Transformers + Trainer API
- Base Model:
distilbert/distilbert-base-uncased - Epochs: 4
- Batch Size: 8
- Learning Rate: 2e-5
- Validation Accuracy: 90.19%
Evaluation
Validation Set Results (20% hold-out):
- Accuracy: 91.18%
- Model shows strong and consistent performance across all 5 classes.
Environmental Impact
- Training was done on a single GPU (T4 GPU on Colab)
- Estimated carbon emissions: Very low (small model + small dataset)
- Downloads last month
- 5
Model tree for Ginidu2003/Distilbert-Base-News-classifier
Base model
distilbert/distilbert-base-uncased