Update README.md
Browse files
README.md
CHANGED
|
@@ -1,14 +1,11 @@
|
|
| 1 |
-
# CryptoBERT
|
| 2 |
-
CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It is built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
# Example of Classification
|
| 6 |
---
|
| 7 |
datasets:
|
| 8 |
- ElKulako/StockTwits-crypto
|
| 9 |
|
| 10 |
---
|
| 11 |
|
|
|
|
|
|
|
| 12 |
## Classification Training
|
| 13 |
The model was trained on the following labels: "Bearish" : 0, "Neutral": 1, "Bullish": 2
|
| 14 |
|
|
@@ -16,6 +13,9 @@ CryptoBERT's sentiment classification head was fine-tuned on a balanced dataset
|
|
| 16 |
|
| 17 |
CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended.
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
## Training Corpus
|
| 20 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
datasets:
|
| 3 |
- ElKulako/StockTwits-crypto
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
# CryptoBERT
|
| 8 |
+
CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It was built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
|
| 9 |
## Classification Training
|
| 10 |
The model was trained on the following labels: "Bearish" : 0, "Neutral": 1, "Bullish": 2
|
| 11 |
|
|
|
|
| 13 |
|
| 14 |
CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended.
|
| 15 |
|
| 16 |
+
# Classification Example
|
| 17 |
+
|
| 18 |
+
|
| 19 |
## Training Corpus
|
| 20 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
| 21 |
|