Rendika
/

tweets-election-classification

+---
+license: mit
+language:
+- en
+- id
+metrics:
+- accuracy
+pipeline_tag: text-classification
+---
+# Election Tweets Classification Model
+This repository contains a fine-tuned of ***indolem/indobertweet-base-uncased model*** for classifying tweets related to election topics. The model has been trained to categorize tweets into eight distinct classes, providing valuable insights into public opinion and discourse during election periods.
+## Classes
+The model classifies tweets into the following categories:
+1. **Politik** (2972 samples)
+2. **Sosial Budaya** (425 samples)
+3. **Ideologi** (343 samples)
+4. **Pertahanan dan Keamanan** (331 samples)
+5. **Ekonomi** (310 samples)
+6. **Sumber Daya Alam** (157 samples)
+7. **Demografi** (61 samples)
+8. **Geografi** (20 samples)
+## Libraries Used
+The following libraries were used for data processing, model training, and evaluation:
+- Data processing: `numpy`, `pandas`, `re`, `string`, `random`
+- Visualization: `matplotlib.pyplot`, `seaborn`, `tqdm`, `plotly.graph_objs`, `plotly.express`, `plotly.figure_factory`
+- Word cloud generation: `PIL`, `wordcloud`
+- NLP: `nltk`, `nlp_id`, `Sastrawi`, `tweet-preprocessor`
+- Machine Learning: `tensorflow`, `keras`, `sklearn`, `transformers`, `torch`
+## Data Preparation
+### Data Split
+The dataset was split into training, validation, and test sets with the following proportions:
+- **Training Set**: 85% (3925 samples)
+- **Validation Set**: 10% (463 samples)
+- **Test Set**: 5% (231 samples)
+### Training Details
+- **Epochs**: 3
+- **Batch Size**: 32
+### Training Results
+| Epoch | Train Loss | Train Accuracy | Validation Loss | Validation Accuracy |
+|-------|------------|----------------|-----------------|---------------------|
+| 1     | 0.9382     | 0.7167         | 0.7518          | 0.7671              |
+| 2     | 0.5741     | 0.8229         | 0.7081          | 0.7931              |
+| 3     | 0.3541     | 0.8958         | 0.7473          | 0.7953              |
+## Model Architecture
+The model is built using the TensorFlow and Keras libraries and employs the following architecture:
+- **Embedding Layer**: Converts input tokens into dense vectors of fixed size.
+- **LSTM Layers**: Bidirectional LSTM layers capture dependencies in the text data.
+- **Dense Layers**: Fully connected layers for classification.
+- **Dropout Layers**: Prevent overfitting by randomly dropping units during training.
+- **Batch Normalization**: Normalizes activations of the previous layer.
+## Usage
+### Installation
+To use the model, ensure you have the required libraries installed. You can install them using pip:
+```bash
+pip install numpy pandas matplotlib seaborn plotly pillow wordcloud nltk tensorflow keras scikit-learn
+```
+### Data Cleaning
+The data was cleaned using the following steps:
+1. Converted text to lowercase.
+2. Removed 'RT'.
+3. Removed links.
+4. Removed patterns like '[RE ...]'.
+5. Removed patterns like '@ ... ='.
+6. Removed non-ASCII characters (including emojis).
+7. Removed punctuation (excluding '#').
+8. Removed excessive whitespace.
+### Sample Code
+Here's a sample code snippet to load and use the model:
+```python
+import tensorflow as tf
+from tensorflow.keras.models import load_model
+import pandas as pd
+# Load the trained model
+model = load_model('path_to_your_model.h5')
+# Preprocess new data
+def preprocess_text(text):
+    # Include your text preprocessing steps here
+    pass
+# Example usage
+new_tweets = pd.Series(["Your new tweet text here"])
+preprocessed_tweets = new_tweets.apply(preprocess_text)
+# Tokenize and pad sequences as done during training
+# ...
+# Predict the class
+predictions = model.predict(preprocessed_tweets)
+predicted_classes = predictions.argmax(axis=-1)
+```
+## Evaluation
+The model was evaluated using the following metrics:
+- **Precision**: Measure of accuracy of the positive predictions.
+- **Recall**: Measure of the ability to find all relevant instances.
+- **F1 Score**: Harmonic mean of precision and recall.
+- **Accuracy**: Overall accuracy of the model.
+- **Balanced Accuracy**: Accuracy adjusted for class imbalance.
+## Conclusion
+This fine-tuned model provides a robust tool for classifying election-related tweets into distinct categories. It can be used to analyze public sentiment and trends during election periods, aiding in better understanding and decision-making.
+## License
+This project is licensed under the MIT License.
+## Contact
+For any questions or feedback, please contact [me] at [rendikarendi96@gmail.com].