YagiASAFAS
/

MyPoliBERT-ver03

@@ -7,72 +7,92 @@ tags:
 model-index:
 - name: MyPoliBERT-ver03
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # MyPoliBERT-ver03
-This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.2655
-- Democracy F1: 0.9312
-- Democracy Accuracy: 0.9318
-- Economy F1: 0.9143
-- Economy Accuracy: 0.9151
-- Race F1: 0.9449
-- Race Accuracy: 0.9456
-- Leadership F1: 0.8488
-- Leadership Accuracy: 0.8494
-- Development F1: 0.8710
-- Development Accuracy: 0.8748
-- Corruption F1: 0.9420
-- Corruption Accuracy: 0.9441
-- Instability F1: 0.9164
-- Instability Accuracy: 0.9198
-- Safety F1: 0.9042
-- Safety Accuracy: 0.9032
-- Administration F1: 0.8831
-- Administration Accuracy: 0.8891
-- Education F1: 0.9565
-- Education Accuracy: 0.9567
-- Religion F1: 0.9426
-- Religion Accuracy: 0.9424
-- Environment F1: 0.9745
-- Environment Accuracy: 0.9746
-- Overall F1: 0.9191
-- Overall Accuracy: 0.9206
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 32
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 16
-- mixed_precision_training: Native AMP
-### Training results
 | Training Loss | Epoch | Step | Validation Loss | Democracy F1 | Democracy Accuracy | Economy F1 | Economy Accuracy | Race F1 | Race Accuracy | Leadership F1 | Leadership Accuracy | Development F1 | Development Accuracy | Corruption F1 | Corruption Accuracy | Instability F1 | Instability Accuracy | Safety F1 | Safety Accuracy | Administration F1 | Administration Accuracy | Education F1 | Education Accuracy | Religion F1 | Religion Accuracy | Environment F1 | Environment Accuracy | Overall F1 | Overall Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:------------:|:------------------:|:----------:|:----------------:|:-------:|:-------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:---------:|:---------------:|:-----------------:|:-----------------------:|:------------:|:------------------:|:-----------:|:-----------------:|:--------------:|:--------------------:|:----------:|:----------------:|
@@ -84,10 +104,18 @@ The following hyperparameters were used during training:
 | 0.0759        | 6.0   | 4044 | 0.2556          | 0.9311       | 0.9313             | 0.9153     | 0.9162           | 0.9465  | 0.9473        | 0.8492        | 0.8511              | 0.8743         | 0.8810               | 0.9431        | 0.9447              | 0.9185         | 0.9205               | 0.9049    | 0.9034          | 0.8797            | 0.8886                  | 0.9588       | 0.9601             | 0.9419      | 0.9421            | 0.9753         | 0.9757               | 0.9199     | 0.9218           |
 | 0.0618        | 7.0   | 4718 | 0.2655          | 0.9312       | 0.9318             | 0.9143     | 0.9151           | 0.9449  | 0.9456        | 0.8488        | 0.8494              | 0.8710         | 0.8748               | 0.9420        | 0.9441              | 0.9164         | 0.9198               | 0.9042    | 0.9032          | 0.8831            | 0.8891                  | 0.9565       | 0.9567             | 0.9426      | 0.9424            | 0.9745         | 0.9746               | 0.9191     | 0.9206           |
-### Framework versions
-- Transformers 4.48.2
-- Pytorch 2.5.1+cu124
-- Datasets 3.2.0
-- Tokenizers 0.21.0

 model-index:
 - name: MyPoliBERT-ver03
   results: []
+datasets:
+- tnwei/ms-newspapers
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to。
+You should proofread and complete it、then remove this comment。 -->
 # MyPoliBERT-ver03
+## Model Overview
+MyPoliBERT-ver03 is a fine-tuned version of (bert-base-uncased) on an unknown dataset、designed for multi-label classification of political topics including Democracy、Economy、Race、Leadership、Development、Corruption、Instability、Safety、Administration、Education、Religion、and Environment。This version is an update of the original YagiASAFAS/MyPoliBERT model and explicitly improves the classification performance for the Leadership topic。
+## Intended Uses and Limitations
+- **Intended Uses**
+  This model is intended for analyzing political texts and identifying multiple political topics、with a special focus on accurately classifying leadership-related content。It can be applied to various text sources such as news articles and social media posts。
+- **Limitations**
+  1. The model is fine-tuned on an unknown dataset、and details regarding the data sources are limited；therefore、its performance may vary on texts from different domains or regions。
+  2. As with most deep learning models、the internal decision process is not inherently interpretable；human review is recommended for critical applications。
+  3. The model may not reflect recent political developments due to the static nature of its training data。
+## Dataset
+The training and evaluation data consist of 29226 records、with an 80% training split and 20% validation split。
+Data Sources include:
+- tnwei/ms-newspapers dataset
+- Malaysian political posts from Reddit
+- Malaysian political posts from Instagram
+- Malaysian political posts from Facebook
+Additionally、to address biases in topics and sentiment observed in news as well as social media posts and comments、a portion of the data was artificially generated using Generative AI-aided Data Augmentation。
+## Model Architecture
+- **Base Model**: (bert-base-uncased)
+- **Task**: Multi-label classification for 12 political topics
+- **Output**: The model outputs classification scores for each topic；in this updated version the Leadership classification has been notably improved。
+## Training Procedure
+- **Hyperparameters**
+  - learning_rate: 3e-05
+  - train_batch_size: 16
+  - eval_batch_size: 16
+  - seed: 42
+  - gradient_accumulation_steps: 2
+  - total_train_batch_size: 32
+  - optimizer: ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08
+  - lr_scheduler_type: linear
+  - num_epochs: 16
+  - mixed_precision_training: Native AMP
+- **Training Configuration**
+  The training followed a standard procedure with periodic evaluation；the best checkpoint (obtained at epoch 7) was selected based on overall performance metrics。
+## Evaluation and Performance
+The model achieves the following results on the evaluation set:
+- Loss: 0.2655
+- Democracy F1: 0.9312
+- Democracy Accuracy: 0.9318
+- Economy F1: 0.9143
+- Economy Accuracy: 0.9151
+- Race F1: 0.9449
+- Race Accuracy: 0.9456
+- Leadership F1: 0.8488
+- Leadership Accuracy: 0.8494
+- Development F1: 0.8710
+- Development Accuracy: 0.8748
+- Corruption F1: 0.9420
+- Corruption Accuracy: 0.9441
+- Instability F1: 0.9164
+- Instability Accuracy: 0.9198
+- Safety F1: 0.9042
+- Safety Accuracy: 0.9032
+- Administration F1: 0.8831
+- Administration Accuracy: 0.8891
+- Education F1: 0.9565
+- Education Accuracy: 0.9567
+- Religion F1: 0.9426
+- Religion Accuracy: 0.9424
+- Environment F1: 0.9745
+- Environment Accuracy: 0.9746
+- Overall F1: 0.9191
+- Overall Accuracy: 0.9206
+These results demonstrate robust performance across most topics、with a particular improvement in the Leadership category compared to the original model。
+### Training Results
 | Training Loss | Epoch | Step | Validation Loss | Democracy F1 | Democracy Accuracy | Economy F1 | Economy Accuracy | Race F1 | Race Accuracy | Leadership F1 | Leadership Accuracy | Development F1 | Development Accuracy | Corruption F1 | Corruption Accuracy | Instability F1 | Instability Accuracy | Safety F1 | Safety Accuracy | Administration F1 | Administration Accuracy | Education F1 | Education Accuracy | Religion F1 | Religion Accuracy | Environment F1 | Environment Accuracy | Overall F1 | Overall Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:------------:|:------------------:|:----------:|:----------------:|:-------:|:-------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:---------:|:---------------:|:-----------------:|:-----------------------:|:------------:|:------------------:|:-----------:|:-----------------:|:--------------:|:--------------------:|:----------:|:----------------:|
 | 0.0759        | 6.0   | 4044 | 0.2556          | 0.9311       | 0.9313             | 0.9153     | 0.9162           | 0.9465  | 0.9473        | 0.8492        | 0.8511              | 0.8743         | 0.8810               | 0.9431        | 0.9447              | 0.9185         | 0.9205               | 0.9049    | 0.9034          | 0.8797            | 0.8886                  | 0.9588       | 0.9601             | 0.9419      | 0.9421            | 0.9753         | 0.9757               | 0.9199     | 0.9218           |
 | 0.0618        | 7.0   | 4718 | 0.2655          | 0.9312       | 0.9318             | 0.9143     | 0.9151           | 0.9449  | 0.9456        | 0.8488        | 0.8494              | 0.8710         | 0.8748               | 0.9420        | 0.9441              | 0.9164         | 0.9198               | 0.9042    | 0.9032          | 0.8831            | 0.8891                  | 0.9565       | 0.9567             | 0.9426      | 0.9424            | 0.9745         | 0.9746               | 0.9191     | 0.9206           |
+## Future Improvements
+- Incorporate additional data and domain adaptation techniques to further improve performance across all topics
+- Enhance model interpretability using explainability methods
+- Monitor and update the model periodically to capture evolving political trends
+## License and Usage Notes
+- The predictions of this model should be used as a reference and interpreted within the context of the training data limitations
+- Users are encouraged to validate model outputs with human review for critical applications
+- Regular updates and retraining are recommended to maintain relevance and accuracy
+### Framework Versions
+- Transformers: 4.48.2
+- Pytorch: 2.5.1+cu124
+- Datasets: 3.2.0
+- Tokenizers: 0.21.0