--- library_name: transformers license: apache-2.0 base_model: bert-base-uncased tags: - generated_from_trainer model-index: - name: MyPoliBERT-ver03 results: [] datasets: - tnwei/ms-newspapers --- # MyPoliBERT-ver03 ## Model Overview MyPoliBERT-ver03 is a fine-tuned version of (bert-base-uncased) on multiple datasets, designed for multi-label classification of political topics including Democracy, Economy, Race, Leadership, Development, Corruption, Instability, Safety, Administration, Education, Religion, and Environment. This version is an update of the original YagiASAFAS/MyPoliBERT model and explicitly improves the classification performance for the Leadership topic. ## Intended Uses and Limitations - **Intended Uses** This model is intended for analyzing political texts and identifying multiple political topics, with a special focus on accurately classifying leadership-related content. It can be applied to various text sources such as news articles and social media posts. - **Limitations** 1. The model is fine-tuned on an unknown dataset, and details regarding the data sources are limited; therefore, its performance may vary on texts from different domains or regions. 2. As with most deep learning models, the internal decision process is not inherently interpretable; human review is recommended for critical applications. 3. The model may not reflect recent political developments due to the static nature of its training data. ## Dataset The training and evaluation data consist of 29226 records, with an 80% training split and 20% validation split. Data Sources include: - tnwei/ms-newspapers dataset - Malaysian political posts from Reddit - Malaysian political posts from Instagram - Malaysian political posts from Facebook Additionally, to address biases in topics and sentiment observed in news as well as social media posts and comments, a portion of the data was artificially generated using Generative AI-aided Data Augmentation. ## Model Architecture - **Base Model**: (bert-base-uncased) - **Task**: Multi-label classification for 12 political topics - **Output**: The model outputs classification scores for each topic; in this updated version the Leadership classification has been notably improved. ## Training Procedure - **Hyperparameters** - learning_rate: 3e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 32 - optimizer: ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 16 - mixed_precision_training: Native AMP - **Training Configuration** The training followed a standard procedure with periodic evaluation; the best checkpoint (obtained at epoch 7) was selected based on overall performance metrics. ## Evaluation and Performance The model achieves the following results on the evaluation set: - Loss: 0.2655 - Democracy F1: 0.9312 - Democracy Accuracy: 0.9318 - Economy F1: 0.9143 - Economy Accuracy: 0.9151 - Race F1: 0.9449 - Race Accuracy: 0.9456 - Leadership F1: 0.8488 - Leadership Accuracy: 0.8494 - Development F1: 0.8710 - Development Accuracy: 0.8748 - Corruption F1: 0.9420 - Corruption Accuracy: 0.9441 - Instability F1: 0.9164 - Instability Accuracy: 0.9198 - Safety F1: 0.9042 - Safety Accuracy: 0.9032 - Administration F1: 0.8831 - Administration Accuracy: 0.8891 - Education F1: 0.9565 - Education Accuracy: 0.9567 - Religion F1: 0.9426 - Religion Accuracy: 0.9424 - Environment F1: 0.9745 - Environment Accuracy: 0.9746 - Overall F1: 0.9191 - Overall Accuracy: 0.9206 These results demonstrate robust performance across most topics, with a particular improvement in the Leadership category compared to the original model. ### Training Results | Training Loss | Epoch | Step | Validation Loss | Democracy F1 | Democracy Accuracy | Economy F1 | Economy Accuracy | Race F1 | Race Accuracy | Leadership F1 | Leadership Accuracy | Development F1 | Development Accuracy | Corruption F1 | Corruption Accuracy | Instability F1 | Instability Accuracy | Safety F1 | Safety Accuracy | Administration F1 | Administration Accuracy | Education F1 | Education Accuracy | Religion F1 | Religion Accuracy | Environment F1 | Environment Accuracy | Overall F1 | Overall Accuracy | |:-------------:|:-----:|:----:|:---------------:|:------------:|:------------------:|:----------:|:----------------:|:-------:|:-------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:-------------:|:-------------------:|:--------------:|:--------------------:|:---------:|:---------------:|:-----------------:|:-----------------------:|:------------:|:------------------:|:-----------:|:-----------------:|:--------------:|:--------------------:|:----------:|:----------------:| | 0.448 | 1.0 | 674 | 0.2781 | 0.8973 | 0.9201 | 0.8952 | 0.9062 | 0.9346 | 0.9385 | 0.8199 | 0.8340 | 0.8462 | 0.8672 | 0.9210 | 0.9302 | 0.8873 | 0.9084 | 0.8869 | 0.8947 | 0.8307 | 0.8700 | 0.9344 | 0.9467 | 0.9219 | 0.9304 | 0.9565 | 0.9619 | 0.8943 | 0.9090 | | 0.2646 | 2.0 | 1348 | 0.2372 | 0.9232 | 0.9335 | 0.9111 | 0.9144 | 0.9438 | 0.9467 | 0.8406 | 0.8403 | 0.8669 | 0.8739 | 0.9385 | 0.9424 | 0.9222 | 0.9278 | 0.9038 | 0.9081 | 0.8724 | 0.8869 | 0.9543 | 0.9580 | 0.9380 | 0.9409 | 0.9732 | 0.9734 | 0.9157 | 0.9205 | | 0.1696 | 3.0 | 2022 | 0.2291 | 0.9277 | 0.9333 | 0.9132 | 0.9177 | 0.9441 | 0.9469 | 0.8465 | 0.8503 | 0.8768 | 0.8847 | 0.9423 | 0.9454 | 0.9219 | 0.9255 | 0.9104 | 0.9114 | 0.8806 | 0.8919 | 0.9592 | 0.9597 | 0.9407 | 0.9419 | 0.9753 | 0.9766 | 0.9199 | 0.9238 | | 0.1309 | 4.0 | 2696 | 0.2374 | 0.9290 | 0.9344 | 0.9168 | 0.9175 | 0.9441 | 0.9452 | 0.8454 | 0.8470 | 0.8733 | 0.8804 | 0.9433 | 0.9465 | 0.9215 | 0.9233 | 0.9101 | 0.9096 | 0.8762 | 0.8758 | 0.9577 | 0.9597 | 0.9389 | 0.9408 | 0.9740 | 0.9740 | 0.9192 | 0.9212 | | 0.1085 | 5.0 | 3370 | 0.2414 | 0.9314 | 0.9346 | 0.9166 | 0.9175 | 0.9419 | 0.9452 | 0.8492 | 0.8459 | 0.8747 | 0.8808 | 0.9435 | 0.9463 | 0.9218 | 0.9257 | 0.9070 | 0.9083 | 0.8862 | 0.8921 | 0.9574 | 0.9588 | 0.9420 | 0.9426 | 0.9732 | 0.9736 | 0.9204 | 0.9226 | | 0.0759 | 6.0 | 4044 | 0.2556 | 0.9311 | 0.9313 | 0.9153 | 0.9162 | 0.9465 | 0.9473 | 0.8492 | 0.8511 | 0.8743 | 0.8810 | 0.9431 | 0.9447 | 0.9185 | 0.9205 | 0.9049 | 0.9034 | 0.8797 | 0.8886 | 0.9588 | 0.9601 | 0.9419 | 0.9421 | 0.9753 | 0.9757 | 0.9199 | 0.9218 | | 0.0618 | 7.0 | 4718 | 0.2655 | 0.9312 | 0.9318 | 0.9143 | 0.9151 | 0.9449 | 0.9456 | 0.8488 | 0.8494 | 0.8710 | 0.8748 | 0.9420 | 0.9441 | 0.9164 | 0.9198 | 0.9042 | 0.9032 | 0.8831 | 0.8891 | 0.9565 | 0.9567 | 0.9426 | 0.9424 | 0.9745 | 0.9746 | 0.9191 | 0.9206 | ## Future Improvements - Incorporate additional data and domain adaptation techniques to further improve performance across all topics. - Enhance model interpretability using explainability methods. - Monitor and update the model periodically to capture evolving political trends. ## License and Usage Notes - The predictions of this model should be used as a reference and interpreted within the context of the training data limitations. - Users are encouraged to validate model outputs with human review for critical applications. - Regular updates and retraining are recommended to maintain relevance and accuracy. ### Framework Versions - Transformers: 4.48.2 - Pytorch: 2.5.1+cu124 - Datasets: 3.2.0 - Tokenizers: 0.21.0