YagiASAFAS commited on
Commit
e6b7410
·
verified ·
1 Parent(s): 2dfb231

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -50,18 +50,18 @@ For each topic, the model assigns one of four sentiment labels: **unknown, negat
50
 
51
  The training data was aggregated from multiple sources:
52
 
53
- | Data Source | N | Status | Labeling Method |
54
- |---------------------------------------|-------|--------|------------------------------------------------------------------|
55
- | English Newspaper | 5912 | Done | BERT (MyPoliBERT-ver03 was used) |
56
- | English Newspaper Comments (Facebook) | 8471 | Done | BERT |
57
- | Malay Newspaper | 5254 | Done | OpenAI API (translated to English then classified) |
58
- | Chinese Newspaper | 2480 | Done | OpenAI API (translated to English then classified) |
59
- | Tamil Newspaper | 1512 | Done | OpenAI API (translated to English then classified) |
60
- | Reddit | 20000 | Done | BERT |
61
- | Manifesto BN | 98 | Done | OpenAI API |
62
- | Manifesto PH | 180 | Done | OpenAI API |
63
- | Manifesto PN | 15 | Done | OpenAI API |
64
- | Synthetic Data | 4124 | Done | OpenAI API |
65
 
66
  - **NOTE**: The originally aggregated dataset, which included data from various sources (such as English Newspapers, Facebook comments, Malay, Chinese, and Tamil Newspapers, Reddit, Manifestos, and Synthetic Data), contained some noise and misclassifications; after removing these noisy entries, 47,966 clean data points were used for training.
67
 
 
50
 
51
  The training data was aggregated from multiple sources:
52
 
53
+ | Data Source | N | Labeling Method |
54
+ |---------------------------------------|-------|------------------------------------------------------------------|
55
+ | English Newspaper | 5912 | BERT (MyPoliBERT-ver03 was used) |
56
+ | English Newspaper Comments (Facebook) | 8471 | BERT |
57
+ | Malay Newspaper | 5254 | OpenAI API (translated to English then classified) |
58
+ | Chinese Newspaper | 2480 | OpenAI API (translated to English then classified) |
59
+ | Tamil Newspaper | 1512 | OpenAI API (translated to English then classified) |
60
+ | Reddit | 20000 | BERT (MyPoliBERT-ver03 was used) |
61
+ | Manifesto BN | 98 | OpenAI API |
62
+ | Manifesto PH | 180 | OpenAI API |
63
+ | Manifesto PN | 15 | OpenAI API |
64
+ | Synthetic Data | 4124 | OpenAI API |
65
 
66
  - **NOTE**: The originally aggregated dataset, which included data from various sources (such as English Newspapers, Facebook comments, Malay, Chinese, and Tamil Newspapers, Reddit, Manifestos, and Synthetic Data), contained some noise and misclassifications; after removing these noisy entries, 47,966 clean data points were used for training.
67