YagiASAFAS commited on
Commit
8ad97b5
·
verified ·
1 Parent(s): 49d5055

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -2
README.md CHANGED
@@ -1,3 +1,13 @@
 
 
 
 
 
 
 
 
 
 
1
  # MalaysiaPoliBERT Model Card
2
 
3
  **Model Name:** MalaysiaPoliBERT
@@ -62,8 +72,14 @@ The training data was aggregated from multiple sources:
62
  - **Implementation:** The YagiASAFAS/MyPoliBERT-ver03 model was used to classify the texts directly.
63
 
64
  ### OpenAI API Labeling
65
- - **Method:** For non-English news articles (Malay, Chinese, Tamil), texts were first translated into English using an OpenAI API-based translation prompt. Then, a classification prompt was used to label the translated text.
66
-
 
 
 
 
 
 
67
  #### Translation Prompt
68
  ```
69
  You are a professional translator.
@@ -112,6 +128,11 @@ INSTRUCTION
112
  {text}
113
  ```
114
 
 
 
 
 
 
115
  ## Training Details
116
 
117
  **Hyperparameters:**
 
1
+ ---
2
+ license: apache-2.0
3
+ metrics:
4
+ - accuracy
5
+ - f1
6
+ base_model:
7
+ - google-bert/bert-base-uncased
8
+ tags:
9
+ - politics
10
+ ---
11
  # MalaysiaPoliBERT Model Card
12
 
13
  **Model Name:** MalaysiaPoliBERT
 
72
  - **Implementation:** The YagiASAFAS/MyPoliBERT-ver03 model was used to classify the texts directly.
73
 
74
  ### OpenAI API Labeling
75
+ - **Method:** For non-English news articles (Malay, Chinese, Tamil), texts were first translated into English and then labeled.
76
+ - **Process:**
77
+ - **Translation:** A translation prompt was used to convert non-English texts into English.
78
+ - **Classification:** After translation, a classification prompt was used to assign labels.
79
+ - **Additional Details:**
80
+ OpenAI API labeling was performed by combining Human-in-the-loop machine learning—where prompt engineering was applied to select the most accurate prompt—with the OpenAI API (gpt-4o-mini) to generate labels.
81
+
82
+
83
  #### Translation Prompt
84
  ```
85
  You are a professional translator.
 
128
  {text}
129
  ```
130
 
131
+ #### Synthetic Data via Data Augmentation
132
+ - **Method**: Synthetic data was generated to balance the dataset by augmenting underrepresented labels or sentiments.
133
+
134
+ - **Implementation**: The OpenAI API was used (in combination with Human-in-the-loop prompt engineering) to generate artificial data that is either not present in the original dataset or is scarce. This synthetic data was then mixed with the original data to improve label balance.
135
+
136
  ## Training Details
137
 
138
  **Hyperparameters:**