cardiffnlp
/

tweet-topic-latest-multi

@@ -1,46 +1,68 @@
----
-tags:
-- generated_from_keras_callback
-model-index:
-- name: tweet-topic-latest-multi
-  results: []
----
-<!-- This model card has been generated automatically according to the information Keras had access to. You should
-probably proofread and complete it, then remove this comment. -->
-# tweet-topic-latest-multi
-This model is a fine-tuned version of [antypasd/tweet-topic-latest-multi](https://huggingface.co/antypasd/tweet-topic-latest-multi) on an unknown dataset.
-It achieves the following results on the evaluation set:
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- optimizer: None
-- training_precision: float32
-### Training results
-### Framework versions
-- Transformers 4.23.1
-- TensorFlow 2.10.0
-- Tokenizers 0.13.1

+# tweet-topic-latest-multi
+This is a RoBERTa-base model trained on 168.86M tweets until the end of September 2022 and finetuned for multi-label topic classification on a corpus of 11,267 [tweets](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi).
+The original RoBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-sep2022). This model is suitable for English.
+- Reference Papers: [TimeLMs paper](https://arxiv.org/abs/2202.03829), [TweetTopic](https://arxiv.org/abs/2209.09824)
+- Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
+<b>Labels</b>:
+| <span style="font-weight:normal">0: arts_&_culture</span>           | <span style="font-weight:normal">5: fashion_&_style</span>   | <span style="font-weight:normal">10: learning_&_educational</span>  | <span style="font-weight:normal">15: science_&_technology</span>  |
+|-----------------------------|---------------------|----------------------------|--------------------------|
+| 1: business_&_entrepreneurs | 6: film_tv_&_video  | 11: music                  | 16: sports               |
+| 2: celebrity_&_pop_culture  | 7: fitness_&_health | 12: news_&_social_concern  | 17: travel_&_adventure   |
+| 3: diaries_&_daily_life     | 8: food_&_dining    | 13: other_hobbies          | 18: youth_&_student_life |
+| 4: family                   | 9: gaming           | 14: relationships          |                          |
+## Full classification example
+```python
+from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
+from transformers import AutoTokenizer
+import numpy as np
+from scipy.special import expit
+MODEL = f"cardiffnlp/tweet-topic-latest-multi"
+tokenizer = AutoTokenizer.from_pretrained(MODEL)
+# PT
+model = AutoModelForSequenceClassification.from_pretrained(MODEL)
+class_mapping = model.config.id2label
+text = "It is great to see athletes promoting awareness for climate change."
+tokens = tokenizer(text, return_tensors='pt')
+output = model(**tokens)
+scores = output[0][0].detach().numpy()
+scores = expit(scores)
+predictions = (scores >= 0.5) * 1
+# TF
+#tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
+#class_mapping = tf_model.config.id2label
+#text = "It is great to see athletes promoting awareness for climate change."
+#tokens = tokenizer(text, return_tensors='tf')
+#output = tf_model(**tokens)
+#scores = output[0][0]
+#scores = expit(scores)
+#predictions = (scores >= 0.5) * 1
+# Map to classes
+for i in range(len(predictions)):
+  if predictions[i]:
+    print(class_mapping[i])
+```
+Output:
+```
+fitness_&_health
+news_&_social_concern
+sports
+```