| --- |
| tags: |
| - text-classification |
| pipeline_tag: text-classification |
| --- |
| |
|
|
| # Model Card for Model ID |
|
|
| This model is designed to classify news articles |
|
|
|
|
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| This model is designed to classify news articles from the Daily Mirror Online, a Sri Lankan news source, |
| into five categories: Business, Opinion, Political Gossip, Sports, and World News. And this model is developed to analyze and process news content for tasks such as sentiment analysis, or summarization |
|
|
|
|
|
|
| ### Data Sources [optional] |
|
|
| <!-- Provide the basic links for the model. --> |
| The original dataset contained real news content of Daily Mirror , after preprocessing, 1,015 records were selected for training.The data split as %80 train and %20 validation. |
|
|
| ## Uses |
|
|
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
|
| ### Direct Use |
|
|
|
|
| The model can be used for: |
|
|
| Automatic categorization of Sri Lankan news articles |
|
|
| News filtering and recommendation systems |
|
|
| Preliminary analysis of sentiment in news articles |
|
|
|
|
| ### Downstream Use [optional] |
|
|
| News aggregation platforms can use the model to categorize and sort articles. |
|
|
| Journalists and researchers can analyze media trends based on category distributions. |
|
|
|
|
| ### Out-of-Scope Use |
|
|
|
|
| This model should not be used for critical decision-making tasks such as political analysis, stock market predictions, or legal judgments. |
|
|
| It may not generalize well to non-Sri Lankan news sources. |
|
|
|
|
|
|
| ## Bias, Risks, and Limitations |
|
|
| The dataset is limited to Daily Mirror Online, which may introduce biases in classification. |
|
|
| The model might misclassify articles if they contain mixed topics. |
|
|
| The dataset size is small (1,015 articles), which may impact performance on diverse news sources. |
|
|
|
|
|
|
|
|
| ## How to Get Started with the Model |
|
|
| Use the code below to get started with the model. |
|
|
| ```python |
| new_model = "Imasha17/News_classification.4" |
| |
| # Use a pipeline as a high-level helper |
| from transformers import pipeline |
| |
| pipe = pipeline("text-classification", model="Imasha17/News_classification.4") |
| text="Enter your news here" |
| pipe (text) |
| |
| ``` |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| The dataset comprises 1,015 preprocessed news articles from Daily Mirror Online. |
|
|
|
|
| #### Training Hyperparameters |
|
|
| - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
| Model Architecture: distilbert-base-uncased |
|
|
| Batch Size: 4 |
|
|
| Epochs: 3 |
|
|
|
|
| ### Testing Data, Factors & Metrics |
|
|
| #### Testing Data |
|
|
| <!-- This should link to a Dataset Card if possible. --> |
| 20% of the dataset (203 articles) used for validation/testing. |
|
|
|
|
| ### Results |
|
|
| The model performed well, but misclassification occurs when articles have overlapping content. |
|
|
|
|
| ## Model Examination [optional] |
|
|
|
|
| The model effectively classifies Sri Lankan news articles. |
|
|
| It can be fine-tuned on larger datasets for improved accuracy. |
|
|
|
|
| ### Model Architecture and Objective |
|
|
| distilbert-base-uncased |
|
|
| Objective: Multiclass text classification |
|
|
|
|