File size: 3,213 Bytes
bdb3f09
2bc923f
 
 
bdb3f09
2bc923f
bdb3f09
 
 
bdbff0f
bdb3f09
 
 
 
 
 
 
bdbff0f
 
bdb3f09
 
 
bdbff0f
bdb3f09
 
bdbff0f
bdb3f09
 
 
 
 
98a3d4c
bdb3f09
 
 
98a3d4c
 
 
 
 
 
 
 
bdb3f09
 
 
98a3d4c
 
 
bdb3f09
 
 
 
 
98a3d4c
 
 
 
 
bdb3f09
 
 
98a3d4c
 
 
 
 
 
bdb3f09
 
 
 
 
 
 
34812f6
 
 
 
98a3d4c
2976828
34812f6
 
 
 
 
bdb3f09
 
 
 
 
98a3d4c
bdb3f09
 
 
 
 
 
c5e312d
bdb3f09
c5e312d
98a3d4c
c5e312d
98a3d4c
bdb3f09
 
 
 
 
 
98a3d4c
 
bdb3f09
 
 
98a3d4c
bdb3f09
 
 
 
 
98a3d4c
 
 
 
 
bdb3f09
 
c5e312d
98a3d4c
 
bdb3f09
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
tags:
- text-classification  # Change this based on your model type
pipeline_tag: text-classification  # Choose the correct pipeline
---


# Model Card for Model ID

This model is designed to classify news articles 



## Model Details

### Model Description

This model is designed to classify news articles from the Daily Mirror Online, a Sri Lankan news source,
into five categories: Business, Opinion, Political Gossip, Sports, and World News. And this model is developed to analyze and process news content for tasks such as sentiment analysis, or summarization



### Data Sources [optional]

<!-- Provide the basic links for the model. -->
The original dataset contained real news content of Daily Mirror ,  after preprocessing, 1,015 records were selected for training.The data split as %80 train and %20 validation.

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->


### Direct Use


The model can be used for:

Automatic categorization of Sri Lankan news articles

News filtering and recommendation systems

Preliminary analysis of sentiment in news articles


### Downstream Use [optional]

News aggregation platforms can use the model to categorize and sort articles.

Journalists and researchers can analyze media trends based on category distributions.


### Out-of-Scope Use


This model should not be used for critical decision-making tasks such as political analysis, stock market predictions, or legal judgments.

It may not generalize well to non-Sri Lankan news sources.



## Bias, Risks, and Limitations

The dataset is limited to Daily Mirror Online, which may introduce biases in classification.

The model might misclassify articles if they contain mixed topics.

The dataset size is small (1,015 articles), which may impact performance on diverse news sources.




## How to Get Started with the Model

Use the code below to get started with the model.

```python
new_model = "Imasha17/News_classification.4"

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="Imasha17/News_classification.4")
text="Enter your news here"
pipe (text)

   ```

## Training Details

### Training Data

The dataset comprises 1,015 preprocessed news articles from Daily Mirror Online.


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

Model Architecture: distilbert-base-uncased

Batch Size: 4

Epochs: 3


### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->
20% of the dataset (203 articles) used for validation/testing.


### Results

The model performed well, but misclassification occurs when articles have overlapping content.


## Model Examination [optional]


The model effectively classifies Sri Lankan news articles.

It can be fine-tuned on larger datasets for improved accuracy.


### Model Architecture and Objective

distilbert-base-uncased

Objective: Multiclass text classification