|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
base_model: microsoft/deberta-v3-base |
|
|
tags: |
|
|
- text-classification |
|
|
- deberta-v3 |
|
|
datasets: |
|
|
- ealvaradob/phishing-dataset |
|
|
- ucberkeley-dlab/measuring-hate-speech |
|
|
- cardiffnlp/tweet_eval |
|
|
- lmsys/toxic-chat |
|
|
- tasksource/jigsaw_toxicity |
|
|
- KoalaAI/Text-Moderation-Multilingual |
|
|
--- |
|
|
|
|
|
# Constellation One |
|
|
|
|
|
An experimental text classification model fine-tuned from Microsoft/DeBERTa-V3 base for [Cockatoo](https://cockatoo.dev/) |
|
|
|
|
|
This model is licensed under the `Apache-2.0` license. |
|
|
|
|
|
**Available Labels:** |
|
|
|
|
|
```json |
|
|
"id2label": { |
|
|
"0": "scam", |
|
|
"1": "violence", |
|
|
"2": "harassment", |
|
|
"3": "hate_speech", |
|
|
"4": "toxicity", |
|
|
"5": "obscenity" |
|
|
} |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
Constellation One achieves a near-SOTA levels of performance within its weight class, specifically excelling in detecting scams and harassment. |
|
|
|
|
|
By default, the model has very high recall values (~0.9) in all categories. After tuning threshold values, recall values will drop to ~0.81, but F1 will increase to ~0.74. |
|
|
|
|
|
### Evaluation (Untuned Thresholds): |
|
|
|
|
|
**Thresholds:** |
|
|
|
|
|
```python |
|
|
LABEL_THRESHOLDS = { |
|
|
'scam': 0.5, |
|
|
'violence': 0.5, |
|
|
'harassment': 0.5, |
|
|
'hate_speech': 0.5, |
|
|
'toxicity': 0.5, |
|
|
'obscenity': 0.5 |
|
|
} |
|
|
``` |
|
|
|
|
|
**Raw Eval Metrics:** |
|
|
|
|
|
```json |
|
|
{ |
|
|
"eval_loss":0.16034406423568726, |
|
|
"eval_precision":0.6059971310039647, |
|
|
"eval_recall":0.9138250950483955, |
|
|
"eval_f1":0.7164361696270752, |
|
|
"eval_precision_scam":0.9117559964465501, |
|
|
"eval_recall_scam":0.9532507739938081, |
|
|
"eval_f1_scam":0.9320417738761919, |
|
|
"eval_precision_violence":0.42734150795721365, |
|
|
"eval_recall_violence":0.8970427163198248, |
|
|
"eval_f1_violence":0.5789008658773634, |
|
|
"eval_precision_harassment":0.7726063829787234, |
|
|
"eval_recall_harassment":0.9423076923076923, |
|
|
"eval_f1_harassment":0.8490605427974948, |
|
|
"eval_precision_hate_speech":0.429821819318537, |
|
|
"eval_recall_hate_speech":0.8969341161121983, |
|
|
"eval_f1_hate_speech":0.5811496196111581, |
|
|
"eval_precision_toxicity":0.5737432488574989, |
|
|
"eval_recall_toxicity":0.8712933753943217, |
|
|
"eval_f1_toxicity":0.6918837675350702, |
|
|
"eval_precision_obscenity":0.5207138304652645, |
|
|
"eval_recall_obscenity":0.9221218961625283, |
|
|
"eval_f1_obscenity":0.6655804480651731, |
|
|
"eval_runtime":247.1414, |
|
|
"eval_samples_per_second":117.512, |
|
|
"eval_steps_per_second":2.452 |
|
|
} |
|
|
``` |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
### Evaluation (Tuned Thresholds): |
|
|
|
|
|
**Thresholds:** |
|
|
|
|
|
```python |
|
|
LABEL_THRESHOLDS = { |
|
|
'scam': 0.60, |
|
|
'violence': 0.73, |
|
|
'harassment': 0.70, |
|
|
'hate_speech': 0.80, |
|
|
'toxicity': 0.75, |
|
|
'obscenity': 0.85 |
|
|
} |
|
|
``` |
|
|
|
|
|
**Raw Eval Metrics:** |
|
|
|
|
|
```json |
|
|
{ |
|
|
"eval_loss":0.16034406423568726, |
|
|
"eval_precision":0.6939850223558622, |
|
|
"eval_recall":0.8150767410772812, |
|
|
"eval_f1":0.7475019013835578, |
|
|
"eval_precision_scam":0.9255447941888619, |
|
|
"eval_recall_scam":0.9467492260061919, |
|
|
"eval_f1_scam":0.936026936026936, |
|
|
"eval_precision_violence":0.5140955364134691, |
|
|
"eval_recall_violence":0.7190580503833516, |
|
|
"eval_f1_violence":0.5995433789954338, |
|
|
"eval_precision_harassment":0.8238218763510592, |
|
|
"eval_recall_harassment":0.8829935125115848, |
|
|
"eval_f1_harassment":0.8523820174457616, |
|
|
"eval_precision_hate_speech":0.5606936416184971, |
|
|
"eval_recall_hate_speech":0.6960208741030659, |
|
|
"eval_f1_hate_speech":0.6210710128055879, |
|
|
"eval_precision_toxicity":0.6890574214517876, |
|
|
"eval_recall_toxicity":0.8025236593059937, |
|
|
"eval_f1_toxicity":0.7414747886913436, |
|
|
"eval_precision_obscenity":0.6506968641114983, |
|
|
"eval_recall_obscenity":0.8431151241534989, |
|
|
"eval_f1_obscenity":0.7345132743362832, |
|
|
"eval_runtime":378.4334, |
|
|
"eval_samples_per_second":76.743, |
|
|
"eval_steps_per_second":1.601 |
|
|
} |
|
|
``` |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
## Resources: |
|
|
|
|
|
Training/Inferencing server: https://github.com/DominicTWHV/Cockatoo_ML_Training/ |
|
|
|
|
|
Training Metrics: https://cockatoo.dev/ml-training.html |
|
|
|
|
|
## Datasets Used | Citations |
|
|
|
|
|
| Dataset | License | Link | |
|
|
| --- | --- | --- | |
|
|
| **Phishing Dataset** | MIT | [Hugging Face](https://huggingface.co/datasets/ealvaradob/phishing-dataset) | |
|
|
| **Measuring Hate Speech** | CC-BY-4.0 | [Hugging Face](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech) | |
|
|
| **Tweet Eval (SemEval-2019)** | [See Citation]* | [Hugging Face](https://huggingface.co/datasets/cardiffnlp/tweet_eval) | |
|
|
| **Toxic Chat** | CC-BY-NC-4.0 | [Hugging Face](https://huggingface.co/datasets/lmsys/toxic-chat) | |
|
|
| **Jigsaw Toxicity** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/tasksource/jigsaw_toxicity) | |
|
|
| **Text Moderation Multilingual** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/KoalaAI/Text-Moderation-Multilingual) | |
|
|
|
|
|
--- |
|
|
|
|
|
### Citation: ucberkeley-dlab/measuring-hate-speech |
|
|
|
|
|
```bibtex |
|
|
@article{kennedy2020constructing, |
|
|
title={Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application}, |
|
|
author={Kennedy, Chris J and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia}, |
|
|
journal={arXiv preprint arXiv:2009.10277}, |
|
|
year={2020} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Citation: cardiffnlp/tweet_eval |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{basile-etal-2019-semeval, |
|
|
title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter", |
|
|
author = "Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel Pardo, Francisco Manuel and Rosso, Paolo and Sanguinetti, Manuela", |
|
|
booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation", |
|
|
year = "2019", |
|
|
address = "Minneapolis, Minnesota, USA", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://www.aclweb.org/anthology/S19-2007", |
|
|
doi = "10.18653/v1/S19-2007", |
|
|
pages = "54--63" |
|
|
} |
|
|
|
|
|
``` |
|
|
|
|
|
### Citation: lmsys/toxic-chat |
|
|
|
|
|
```bibtex |
|
|
@misc{lin2023toxicchat, |
|
|
title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, |
|
|
author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang}, |
|
|
year={2023}, |
|
|
eprint={2310.17389}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Citation: KoalaAI/Text-Moderation-Multilingual |
|
|
|
|
|
```bibtex |
|
|
@misc{text-moderation-large, |
|
|
title={Text-Moderation-Multilingual: A Multilingual Text Moderation Dataset}, |
|
|
author={[KoalaAI]}, |
|
|
year={2025}, |
|
|
note={Aggregated from ifmain's and OpenAI's moderation datasets} |
|
|
} |
|
|
``` |