---
language: en
license: apache-2.0
base_model: microsoft/deberta-v3-base
tags:
- text-classification
- deberta-v3
datasets:
- ealvaradob/phishing-dataset
- ucberkeley-dlab/measuring-hate-speech
- cardiffnlp/tweet_eval
- lmsys/toxic-chat
- tasksource/jigsaw_toxicity
- KoalaAI/Text-Moderation-Multilingual
---

# Constellation One

An experimental text classification model fine-tuned from Microsoft/DeBERTa-V3 base for [Cockatoo](https://cockatoo.dev/)

This model is licensed under the `Apache-2.0` license.

**Available Labels:**

```json
"id2label": {
    "0": "scam",
    "1": "violence",
    "2": "harassment",
    "3": "hate_speech",
    "4": "toxicity",
    "5": "obscenity"
}
```

## Performance

Constellation One achieves a near-SOTA levels of performance within its weight class, specifically excelling in detecting scams and harassment.

By default, the model has very high recall values (~0.9) in all categories. After tuning threshold values, recall values will drop to ~0.81, but F1 will increase to ~0.74.

### Evaluation (Untuned Thresholds):

**Thresholds:**

```python
LABEL_THRESHOLDS = {
    'scam': 0.5,
    'violence': 0.5,
    'harassment': 0.5,
    'hate_speech': 0.5,
    'toxicity': 0.5,
    'obscenity': 0.5
}
```

**Raw Eval Metrics:**

```json
{
   "eval_loss":0.16034406423568726,
   "eval_precision":0.6059971310039647,
   "eval_recall":0.9138250950483955,
   "eval_f1":0.7164361696270752,
   "eval_precision_scam":0.9117559964465501,
   "eval_recall_scam":0.9532507739938081,
   "eval_f1_scam":0.9320417738761919,
   "eval_precision_violence":0.42734150795721365,
   "eval_recall_violence":0.8970427163198248,
   "eval_f1_violence":0.5789008658773634,
   "eval_precision_harassment":0.7726063829787234,
   "eval_recall_harassment":0.9423076923076923,
   "eval_f1_harassment":0.8490605427974948,
   "eval_precision_hate_speech":0.429821819318537,
   "eval_recall_hate_speech":0.8969341161121983,
   "eval_f1_hate_speech":0.5811496196111581,
   "eval_precision_toxicity":0.5737432488574989,
   "eval_recall_toxicity":0.8712933753943217,
   "eval_f1_toxicity":0.6918837675350702,
   "eval_precision_obscenity":0.5207138304652645,
   "eval_recall_obscenity":0.9221218961625283,
   "eval_f1_obscenity":0.6655804480651731,
   "eval_runtime":247.1414,
   "eval_samples_per_second":117.512,
   "eval_steps_per_second":2.452
}
```

![Recall Metrics](assets/graphs/untuned/recall_deberta.png)
![Precision Metrics](assets/graphs/untuned/precision_deberta.png)
![F1 Metrics](assets/graphs/untuned/f1_deberta.png)

---

### Evaluation (Tuned Thresholds):

**Thresholds:**

```python
LABEL_THRESHOLDS = {
    'scam': 0.60,
    'violence': 0.73,
    'harassment': 0.70,
    'hate_speech': 0.80,
    'toxicity': 0.75,
    'obscenity': 0.85
}
```

**Raw Eval Metrics:**

```json
{
   "eval_loss":0.16034406423568726,
   "eval_precision":0.6939850223558622,
   "eval_recall":0.8150767410772812,
   "eval_f1":0.7475019013835578,
   "eval_precision_scam":0.9255447941888619,
   "eval_recall_scam":0.9467492260061919,
   "eval_f1_scam":0.936026936026936,
   "eval_precision_violence":0.5140955364134691,
   "eval_recall_violence":0.7190580503833516,
   "eval_f1_violence":0.5995433789954338,
   "eval_precision_harassment":0.8238218763510592,
   "eval_recall_harassment":0.8829935125115848,
   "eval_f1_harassment":0.8523820174457616,
   "eval_precision_hate_speech":0.5606936416184971,
   "eval_recall_hate_speech":0.6960208741030659,
   "eval_f1_hate_speech":0.6210710128055879,
   "eval_precision_toxicity":0.6890574214517876,
   "eval_recall_toxicity":0.8025236593059937,
   "eval_f1_toxicity":0.7414747886913436,
   "eval_precision_obscenity":0.6506968641114983,
   "eval_recall_obscenity":0.8431151241534989,
   "eval_f1_obscenity":0.7345132743362832,
   "eval_runtime":378.4334,
   "eval_samples_per_second":76.743,
   "eval_steps_per_second":1.601
}
```

![Recall Metrics](assets/graphs/tuned/recall_deberta.png)
![Precision Metrics](assets/graphs/tuned/precision_deberta.png)
![F1 Metrics](assets/graphs/tuned/f1_deberta.png)

---

## Resources:

Training/Inferencing server: https://github.com/DominicTWHV/Cockatoo_ML_Training/

Training Metrics: https://cockatoo.dev/ml-training.html

## Datasets Used | Citations

| Dataset | License | Link |
| --- | --- | --- |
| **Phishing Dataset** | MIT | [Hugging Face](https://huggingface.co/datasets/ealvaradob/phishing-dataset) |
| **Measuring Hate Speech** | CC-BY-4.0 | [Hugging Face](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech) |
| **Tweet Eval (SemEval-2019)** | [See Citation]* | [Hugging Face](https://huggingface.co/datasets/cardiffnlp/tweet_eval) |
| **Toxic Chat** | CC-BY-NC-4.0 | [Hugging Face](https://huggingface.co/datasets/lmsys/toxic-chat) |
| **Jigsaw Toxicity** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/tasksource/jigsaw_toxicity) |
| **Text Moderation Multilingual** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/KoalaAI/Text-Moderation-Multilingual) |

---

### Citation: ucberkeley-dlab/measuring-hate-speech

```bibtex
@article{kennedy2020constructing,
  title={Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application},
  author={Kennedy, Chris J and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia},
  journal={arXiv preprint arXiv:2009.10277},
  year={2020}
}
```

### Citation: cardiffnlp/tweet_eval

```bibtex
@inproceedings{basile-etal-2019-semeval,
    title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter",
    author = "Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel Pardo, Francisco Manuel and Rosso, Paolo and Sanguinetti, Manuela",
    booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
    year = "2019",
    address = "Minneapolis, Minnesota, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/S19-2007",
    doi = "10.18653/v1/S19-2007",
    pages = "54--63"
}

```

### Citation: lmsys/toxic-chat

```bibtex
@misc{lin2023toxicchat,
      title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, 
      author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
      year={2023},
      eprint={2310.17389},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

### Citation: KoalaAI/Text-Moderation-Multilingual

```bibtex
@misc{text-moderation-large,
  title={Text-Moderation-Multilingual: A Multilingual Text Moderation Dataset},
  author={[KoalaAI]},
  year={2025},
  note={Aggregated from ifmain's and OpenAI's moderation datasets}
}
```