Ginidu2003
/

Distilbert-Base-News-classifier

Text Classification

news-classification

text-embeddings-inference

Model card Files Files and versions

Distilbert-Base-News-classifier / README.md

Ginidu2003's picture

Update README.md

73bdc8b verified 2 months ago

|

history blame contribute delete

2.69 kB

	---
	library_name: transformers
	tags:
	- text-classification
	- distilbert
	- news-classification
	- sri-lanka
	base_model:
	- distilbert/distilbert-base-uncased
	---



	## Model Details

	Model Name: `Ginidu2003/Distilbert-Base-News-classifier`
	Model Type: Text Classification
	Base Model: `distilbert/distilbert-base-uncased`
	Language(s): English
	Finetuned from model: [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)

	### Model Description
	This is a fine-tuned DistilBERT model designed to classify English news articles into 5 categories:

	- Business
	- Opinion
	- Political gossip
	- Sports
	- World news



	## Uses

	### Direct Use
	- Classify news articles into one of the five predefined categories.
	- Suitable for English news (Like Daily Mirror style).

	### Downstream Use
	- Can be integrated into web applications (Streamlit/Gradio) for automated news categorization.
	- Can be used for real-time news filtering and topic-based news recommendation systems.

	### Out-of-Scope Use
	- Not intended for other languages.
	- Not trained for sentiment analysis, fake news detection, or hate speech detection.
	- Not suitable for very short texts.

	## Bias, Risks, and Limitations
	- The model is trained only on Daily Mirror news data, so it may perform poorly on other news sources or different writing styles.
	- Potential bias towards Sri Lankan context and English used in Sri Lankan media.
	- Performance may degrade on very long or very short articles.

	## How to Get Started with the Model

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="Ginidu2003/Distilbert-Base-News-classifier"
	)

	result = classifier("Your news article text here...")
	print(result)
	```
	## Training Details

	### Training Data
	- Dataset: Daily Mirror Sri Lankan English news (2024–2025)
	- Total samples: ~1,018 articles (after preprocessing and deduplication)
	- Classes: 5 balanced categories (Business, Opinion, Political gossip, Sports, World news)
	- Preprocessing: Lowercasing, punctuation removal, lemmatization

	### Training Procedure
	- Framework: Hugging Face Transformers + Trainer API
	- Base Model: `distilbert/distilbert-base-uncased`
	- Epochs: 4
	- Batch Size: 8
	- Learning Rate: 2e-5
	- Validation Accuracy: 90.19%

	## Evaluation

	Validation Set Results (20% hold-out):
	- Accuracy: 91.18%
	- Model shows strong and consistent performance across all 5 classes.

	## Environmental Impact
	- Training was done on a single GPU (T4 GPU on Colab)
	- Estimated carbon emissions: Very low (small model + small dataset)