YagiASAFAS commited on
Commit
8ab2a28
·
verified ·
1 Parent(s): e04e294

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -68
README.md CHANGED
@@ -7,72 +7,92 @@ tags:
7
  model-index:
8
  - name: MyPoliBERT-ver03
9
  results: []
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
  # MyPoliBERT-ver03
16
 
17
- This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
18
- It achieves the following results on the evaluation set:
19
- - Loss: 0.2655
20
- - Democracy F1: 0.9312
21
- - Democracy Accuracy: 0.9318
22
- - Economy F1: 0.9143
23
- - Economy Accuracy: 0.9151
24
- - Race F1: 0.9449
25
- - Race Accuracy: 0.9456
26
- - Leadership F1: 0.8488
27
- - Leadership Accuracy: 0.8494
28
- - Development F1: 0.8710
29
- - Development Accuracy: 0.8748
30
- - Corruption F1: 0.9420
31
- - Corruption Accuracy: 0.9441
32
- - Instability F1: 0.9164
33
- - Instability Accuracy: 0.9198
34
- - Safety F1: 0.9042
35
- - Safety Accuracy: 0.9032
36
- - Administration F1: 0.8831
37
- - Administration Accuracy: 0.8891
38
- - Education F1: 0.9565
39
- - Education Accuracy: 0.9567
40
- - Religion F1: 0.9426
41
- - Religion Accuracy: 0.9424
42
- - Environment F1: 0.9745
43
- - Environment Accuracy: 0.9746
44
- - Overall F1: 0.9191
45
- - Overall Accuracy: 0.9206
46
-
47
- ## Model description
48
-
49
- More information needed
50
-
51
- ## Intended uses & limitations
52
-
53
- More information needed
54
-
55
- ## Training and evaluation data
56
-
57
- More information needed
58
-
59
- ## Training procedure
60
-
61
- ### Training hyperparameters
62
-
63
- The following hyperparameters were used during training:
64
- - learning_rate: 3e-05
65
- - train_batch_size: 16
66
- - eval_batch_size: 16
67
- - seed: 42
68
- - gradient_accumulation_steps: 2
69
- - total_train_batch_size: 32
70
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
71
- - lr_scheduler_type: linear
72
- - num_epochs: 16
73
- - mixed_precision_training: Native AMP
74
-
75
- ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  | Training Loss | Epoch | Step | Validation Loss | Democracy F1 | Democracy Accuracy | Economy F1 | Economy Accuracy | Race F1 | Race Accuracy | Leadership F1 | Leadership Accuracy | Development F1 | Development Accuracy | Corruption F1 | Corruption Accuracy | Instability F1 | Instability Accuracy | Safety F1 | Safety Accuracy | Administration F1 | Administration Accuracy | Education F1 | Education Accuracy | Religion F1 | Religion Accuracy | Environment F1 | Environment Accuracy | Overall F1 | Overall Accuracy |
78
  |:-------------:|:-----:|:----:|:---------------:|:------------:|:------------------:|:----------:|:----------------:|:-------:|:-------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:---------:|:---------------:|:-----------------:|:-----------------------:|:------------:|:------------------:|:-----------:|:-----------------:|:--------------:|:--------------------:|:----------:|:----------------:|
@@ -84,10 +104,18 @@ The following hyperparameters were used during training:
84
  | 0.0759 | 6.0 | 4044 | 0.2556 | 0.9311 | 0.9313 | 0.9153 | 0.9162 | 0.9465 | 0.9473 | 0.8492 | 0.8511 | 0.8743 | 0.8810 | 0.9431 | 0.9447 | 0.9185 | 0.9205 | 0.9049 | 0.9034 | 0.8797 | 0.8886 | 0.9588 | 0.9601 | 0.9419 | 0.9421 | 0.9753 | 0.9757 | 0.9199 | 0.9218 |
85
  | 0.0618 | 7.0 | 4718 | 0.2655 | 0.9312 | 0.9318 | 0.9143 | 0.9151 | 0.9449 | 0.9456 | 0.8488 | 0.8494 | 0.8710 | 0.8748 | 0.9420 | 0.9441 | 0.9164 | 0.9198 | 0.9042 | 0.9032 | 0.8831 | 0.8891 | 0.9565 | 0.9567 | 0.9426 | 0.9424 | 0.9745 | 0.9746 | 0.9191 | 0.9206 |
86
 
87
-
88
- ### Framework versions
89
-
90
- - Transformers 4.48.2
91
- - Pytorch 2.5.1+cu124
92
- - Datasets 3.2.0
93
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
7
  model-index:
8
  - name: MyPoliBERT-ver03
9
  results: []
10
+ datasets:
11
+ - tnwei/ms-newspapers
12
  ---
13
 
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to
15
+ You should proofread and complete itthen remove this comment -->
16
 
17
  # MyPoliBERT-ver03
18
 
19
+ ## Model Overview
20
+ MyPoliBERT-ver03 is a fine-tuned version of (bert-base-uncased) on an unknown dataset、designed for multi-label classification of political topics including Democracy、Economy、Race、Leadership、Development、Corruption、Instability、Safety、Administration、Education、Religion、and Environment。This version is an update of the original YagiASAFAS/MyPoliBERT model and explicitly improves the classification performance for the Leadership topic。
21
+
22
+ ## Intended Uses and Limitations
23
+ - **Intended Uses**
24
+ This model is intended for analyzing political texts and identifying multiple political topics、with a special focus on accurately classifying leadership-related content。It can be applied to various text sources such as news articles and social media posts。
25
+
26
+ - **Limitations**
27
+ 1. The model is fine-tuned on an unknown dataset、and details regarding the data sources are limited;therefore、its performance may vary on texts from different domains or regions。
28
+ 2. As with most deep learning models、the internal decision process is not inherently interpretable;human review is recommended for critical applications。
29
+ 3. The model may not reflect recent political developments due to the static nature of its training data。
30
+
31
+ ## Dataset
32
+ The training and evaluation data consist of 29226 records、with an 80% training split and 20% validation split。
33
+ Data Sources include:
34
+ - tnwei/ms-newspapers dataset
35
+ - Malaysian political posts from Reddit
36
+ - Malaysian political posts from Instagram
37
+ - Malaysian political posts from Facebook
38
+
39
+ Additionally、to address biases in topics and sentiment observed in news as well as social media posts and comments、a portion of the data was artificially generated using Generative AI-aided Data Augmentation。
40
+
41
+ ## Model Architecture
42
+ - **Base Model**: (bert-base-uncased)
43
+ - **Task**: Multi-label classification for 12 political topics
44
+ - **Output**: The model outputs classification scores for each topic;in this updated version the Leadership classification has been notably improved。
45
+
46
+ ## Training Procedure
47
+ - **Hyperparameters**
48
+ - learning_rate: 3e-05
49
+ - train_batch_size: 16
50
+ - eval_batch_size: 16
51
+ - seed: 42
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 32
54
+ - optimizer: ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: linear
56
+ - num_epochs: 16
57
+ - mixed_precision_training: Native AMP
58
+
59
+ - **Training Configuration**
60
+ The training followed a standard procedure with periodic evaluation;the best checkpoint (obtained at epoch 7) was selected based on overall performance metrics。
61
+
62
+ ## Evaluation and Performance
63
+ The model achieves the following results on the evaluation set:
64
+
65
+ - Loss: 0.2655
66
+ - Democracy F1: 0.9312
67
+ - Democracy Accuracy: 0.9318
68
+ - Economy F1: 0.9143
69
+ - Economy Accuracy: 0.9151
70
+ - Race F1: 0.9449
71
+ - Race Accuracy: 0.9456
72
+ - Leadership F1: 0.8488
73
+ - Leadership Accuracy: 0.8494
74
+ - Development F1: 0.8710
75
+ - Development Accuracy: 0.8748
76
+ - Corruption F1: 0.9420
77
+ - Corruption Accuracy: 0.9441
78
+ - Instability F1: 0.9164
79
+ - Instability Accuracy: 0.9198
80
+ - Safety F1: 0.9042
81
+ - Safety Accuracy: 0.9032
82
+ - Administration F1: 0.8831
83
+ - Administration Accuracy: 0.8891
84
+ - Education F1: 0.9565
85
+ - Education Accuracy: 0.9567
86
+ - Religion F1: 0.9426
87
+ - Religion Accuracy: 0.9424
88
+ - Environment F1: 0.9745
89
+ - Environment Accuracy: 0.9746
90
+ - Overall F1: 0.9191
91
+ - Overall Accuracy: 0.9206
92
+
93
+ These results demonstrate robust performance across most topics、with a particular improvement in the Leadership category compared to the original model。
94
+
95
+ ### Training Results
96
 
97
  | Training Loss | Epoch | Step | Validation Loss | Democracy F1 | Democracy Accuracy | Economy F1 | Economy Accuracy | Race F1 | Race Accuracy | Leadership F1 | Leadership Accuracy | Development F1 | Development Accuracy | Corruption F1 | Corruption Accuracy | Instability F1 | Instability Accuracy | Safety F1 | Safety Accuracy | Administration F1 | Administration Accuracy | Education F1 | Education Accuracy | Religion F1 | Religion Accuracy | Environment F1 | Environment Accuracy | Overall F1 | Overall Accuracy |
98
  |:-------------:|:-----:|:----:|:---------------:|:------------:|:------------------:|:----------:|:----------------:|:-------:|:-------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:---------:|:---------------:|:-----------------:|:-----------------------:|:------------:|:------------------:|:-----------:|:-----------------:|:--------------:|:--------------------:|:----------:|:----------------:|
 
104
  | 0.0759 | 6.0 | 4044 | 0.2556 | 0.9311 | 0.9313 | 0.9153 | 0.9162 | 0.9465 | 0.9473 | 0.8492 | 0.8511 | 0.8743 | 0.8810 | 0.9431 | 0.9447 | 0.9185 | 0.9205 | 0.9049 | 0.9034 | 0.8797 | 0.8886 | 0.9588 | 0.9601 | 0.9419 | 0.9421 | 0.9753 | 0.9757 | 0.9199 | 0.9218 |
105
  | 0.0618 | 7.0 | 4718 | 0.2655 | 0.9312 | 0.9318 | 0.9143 | 0.9151 | 0.9449 | 0.9456 | 0.8488 | 0.8494 | 0.8710 | 0.8748 | 0.9420 | 0.9441 | 0.9164 | 0.9198 | 0.9042 | 0.9032 | 0.8831 | 0.8891 | 0.9565 | 0.9567 | 0.9426 | 0.9424 | 0.9745 | 0.9746 | 0.9191 | 0.9206 |
106
 
107
+ ## Future Improvements
108
+ - Incorporate additional data and domain adaptation techniques to further improve performance across all topics
109
+ - Enhance model interpretability using explainability methods
110
+ - Monitor and update the model periodically to capture evolving political trends
111
+
112
+ ## License and Usage Notes
113
+ - The predictions of this model should be used as a reference and interpreted within the context of the training data limitations
114
+ - Users are encouraged to validate model outputs with human review for critical applications
115
+ - Regular updates and retraining are recommended to maintain relevance and accuracy
116
+
117
+ ### Framework Versions
118
+ - Transformers: 4.48.2
119
+ - Pytorch: 2.5.1+cu124
120
+ - Datasets: 3.2.0
121
+ - Tokenizers: 0.21.0