Text Classification
Transformers
Safetensors
distilbert
news-classification
sri-lanka
text-embeddings-inference
Instructions to use Ginidu2003/Distilbert-Base-News-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ginidu2003/Distilbert-Base-News-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Ginidu2003/Distilbert-Base-News-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Ginidu2003/Distilbert-Base-News-classifier") model = AutoModelForSequenceClassification.from_pretrained("Ginidu2003/Distilbert-Base-News-classifier") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - text-classification | |
| - distilbert | |
| - news-classification | |
| - sri-lanka | |
| base_model: | |
| - distilbert/distilbert-base-uncased | |
| ## Model Details | |
| **Model Name:** `Ginidu2003/Distilbert-Base-News-classifier` | |
| **Model Type:** Text Classification | |
| **Base Model:** `distilbert/distilbert-base-uncased` | |
| **Language(s):** English | |
| **Finetuned from model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) | |
| ### Model Description | |
| This is a fine-tuned DistilBERT model designed to classify English news articles into **5 categories**: | |
| - Business | |
| - Opinion | |
| - Political gossip | |
| - Sports | |
| - World news | |
| ## Uses | |
| ### Direct Use | |
| - Classify news articles into one of the five predefined categories. | |
| - Suitable for English news (Like Daily Mirror style). | |
| ### Downstream Use | |
| - Can be integrated into web applications (Streamlit/Gradio) for automated news categorization. | |
| - Can be used for real-time news filtering and topic-based news recommendation systems. | |
| ### Out-of-Scope Use | |
| - Not intended for other languages. | |
| - Not trained for sentiment analysis, fake news detection, or hate speech detection. | |
| - Not suitable for very short texts. | |
| ## Bias, Risks, and Limitations | |
| - The model is trained only on **Daily Mirror** news data, so it may perform poorly on other news sources or different writing styles. | |
| - Potential bias towards Sri Lankan context and English used in Sri Lankan media. | |
| - Performance may degrade on very long or very short articles. | |
| ## How to Get Started with the Model | |
| ```python | |
| from transformers import pipeline | |
| classifier = pipeline( | |
| "text-classification", | |
| model="Ginidu2003/Distilbert-Base-News-classifier" | |
| ) | |
| result = classifier("Your news article text here...") | |
| print(result) | |
| ``` | |
| ## Training Details | |
| ### Training Data | |
| - **Dataset**: Daily Mirror Sri Lankan English news (2024–2025) | |
| - **Total samples**: ~1,018 articles (after preprocessing and deduplication) | |
| - **Classes**: 5 balanced categories (Business, Opinion, Political gossip, Sports, World news) | |
| - **Preprocessing**: Lowercasing, punctuation removal, lemmatization | |
| ### Training Procedure | |
| - **Framework**: Hugging Face Transformers + Trainer API | |
| - **Base Model**: `distilbert/distilbert-base-uncased` | |
| - **Epochs**: 4 | |
| - **Batch Size**: 8 | |
| - **Learning Rate**: 2e-5 | |
| - **Validation Accuracy**: **90.19%** | |
| ## Evaluation | |
| **Validation Set Results (20% hold-out):** | |
| - **Accuracy**: **91.18%** | |
| - Model shows strong and consistent performance across all 5 classes. | |
| ## Environmental Impact | |
| - Training was done on a single GPU (T4 GPU on Colab) | |
| - Estimated carbon emissions: Very low (small model + small dataset) |