YagiASAFAS
/

PoliBERT-MY

Model card Files Files and versions

YagiASAFAS commited on Apr 5, 2025

Commit

8ad97b5

·

verified ·

1 Parent(s): 49d5055

Update README.md

Files changed (1) hide show

README.md +23 -2

README.md CHANGED Viewed

@@ -1,3 +1,13 @@
 # MalaysiaPoliBERT Model Card
 **Model Name:** MalaysiaPoliBERT
@@ -62,8 +72,14 @@ The training data was aggregated from multiple sources:
 - **Implementation:** The YagiASAFAS/MyPoliBERT-ver03 model was used to classify the texts directly.
 ### OpenAI API Labeling
-- **Method:** For non-English news articles (Malay, Chinese, Tamil), texts were first translated into English using an OpenAI API-based translation prompt. Then, a classification prompt was used to label the translated text.
 #### Translation Prompt
 ```
 You are a professional translator.
@@ -112,6 +128,11 @@ INSTRUCTION
     {text}
 ```
 ## Training Details
 **Hyperparameters:**

+---
+license: apache-2.0
+metrics:
+- accuracy
+- f1
+base_model:
+- google-bert/bert-base-uncased
+tags:
+- politics
+---
 # MalaysiaPoliBERT Model Card
 **Model Name:** MalaysiaPoliBERT
 - **Implementation:** The YagiASAFAS/MyPoliBERT-ver03 model was used to classify the texts directly.
 ### OpenAI API Labeling
+- **Method:** For non-English news articles (Malay, Chinese, Tamil), texts were first translated into English and then labeled.
+- **Process:**
+  - **Translation:** A translation prompt was used to convert non-English texts into English.
+  - **Classification:** After translation, a classification prompt was used to assign labels.
+- **Additional Details:**
+  OpenAI API labeling was performed by combining Human-in-the-loop machine learning—where prompt engineering was applied to select the most accurate prompt—with the OpenAI API (gpt-4o-mini) to generate labels.
 #### Translation Prompt
 ```
 You are a professional translator.
     {text}
 ```
+#### Synthetic Data via Data Augmentation
+- **Method**: Synthetic data was generated to balance the dataset by augmenting underrepresented labels or sentiments.
+- **Implementation**: The OpenAI API was used (in combination with Human-in-the-loop prompt engineering) to generate artificial data that is either not present in the original dataset or is scarce. This synthetic data was then mixed with the original data to improve label balance.
 ## Training Details
 **Hyperparameters:**