Update README.md
Browse files
README.md
CHANGED
|
@@ -17,18 +17,25 @@ CryptoBERT was trained with a max sequence length of 128. Technically, it can ha
|
|
| 17 |
# Classification Example
|
| 18 |
```python
|
| 19 |
from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer
|
| 20 |
-
from datasets import load_dataset
|
| 21 |
-
dataset_name = "ElKulako/stocktwits-crypto"
|
| 22 |
-
dataset = load_dataset(dataset_name)
|
| 23 |
model_name = "ElKulako/cryptobert"
|
| 24 |
-
|
| 25 |
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 3)
|
| 26 |
-
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
preds = pipe(df_posts)
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
```
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## Training Corpus
|
| 33 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
| 34 |
|
|
|
|
| 17 |
# Classification Example
|
| 18 |
```python
|
| 19 |
from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer
|
|
|
|
|
|
|
|
|
|
| 20 |
model_name = "ElKulako/cryptobert"
|
| 21 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
|
| 22 |
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 3)
|
| 23 |
+
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, max_length=64, truncation=True, padding = 'max_length')
|
| 24 |
+
# post_1 & post_3 = bullish, post_2 = bearish
|
| 25 |
+
post_1 = " see y'all tomorrow and can't wait to see ada in the morning, i wonder what price it is going to be at. 😎🐂🤠💯😴, bitcoin is looking good go for it and flash by that 45k. "
|
| 26 |
+
post_2 = " alright racers, it’s a race to the bottom! good luck today and remember there are no losers (minus those who invested in currency nobody really uses) take your marks... are you ready? go!!"
|
| 27 |
+
post_3 = " i'm never selling. the whole market can bottom out. i'll continue to hold this dumpster fire until the day i die if i need to."
|
| 28 |
+
df_posts = [post_1, post_2, post_3]
|
| 29 |
preds = pipe(df_posts)
|
| 30 |
+
print(preds)
|
| 31 |
|
| 32 |
|
| 33 |
```
|
| 34 |
|
| 35 |
+
```
|
| 36 |
+
[{'label': 'Bullish', 'score': 0.8734585642814636}, {'label': 'Bearish', 'score': 0.9889495372772217}, {'label': 'Bullish', 'score': 0.6595883965492249}]
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
## Training Corpus
|
| 40 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
| 41 |
|