Update README.md
Browse files
README.md
CHANGED
|
@@ -44,14 +44,15 @@ Output:
|
|
| 44 |
Total of 3.803 text have been labelled manually to fine tune the model, with consideration of non-duplicate and a minimum of 4 words after cleaning. The following website were used for our training dataset:
|
| 45 |
1. Bitcoin tweet dataset from [Kaggle Datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets) (Randomly picked).
|
| 46 |
2. Labelled crypto sentiment dataset from [SurgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset).
|
| 47 |
-
3. Reddit thread r/Bitcoin with the topic "Daily Discussion" (Randomly picked)
|
| 48 |
-
|
|
|
|
| 49 |
|
| 50 |
# Source Code
|
| 51 |
See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
|
| 52 |
|
| 53 |
# Credit
|
| 54 |
-
Credit where credit
|
| 55 |
|
| 56 |
1. Muhaza Liebenlito, M.Si and Prof. Dr. Nur Inayah, M.Si. as my academic advisor.
|
| 57 |
2. Risky Amalia Marhariyadi for helping labelling the dataset.
|
|
|
|
| 44 |
Total of 3.803 text have been labelled manually to fine tune the model, with consideration of non-duplicate and a minimum of 4 words after cleaning. The following website were used for our training dataset:
|
| 45 |
1. Bitcoin tweet dataset from [Kaggle Datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets) (Randomly picked).
|
| 46 |
2. Labelled crypto sentiment dataset from [SurgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset).
|
| 47 |
+
3. Reddit thread r/Bitcoin with the topic "Daily Discussion" (Randomly picked)
|
| 48 |
+
|
| 49 |
+
Data augmentation was also performed to enrich the dataset, Back-Translation was used with Google Translate API on 10 language ('it', 'fr', "sv", "da", 'pt', 'id', 'pl', 'hr', "bg", "fi").
|
| 50 |
|
| 51 |
# Source Code
|
| 52 |
See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
|
| 53 |
|
| 54 |
# Credit
|
| 55 |
+
Credit where credit is due, thank you for all!
|
| 56 |
|
| 57 |
1. Muhaza Liebenlito, M.Si and Prof. Dr. Nur Inayah, M.Si. as my academic advisor.
|
| 58 |
2. Risky Amalia Marhariyadi for helping labelling the dataset.
|