Update README.md
Browse files# Constellation-One-Text-001
The Constellation-One-Text-001 model created for the [Cockatoo Project](https://cockatoo.dev/)
This model is highly experimental and may not be ready for production use.
**Resources:**
Training code and inference code may be found [here](https://github.com/DominicTWHV/Cockatoo_ML_Training/)
Model training metrics may be found [here](https://cockatoo.dev/ml-training.html)
**Datasets Used:**
| Dataset | License | Link |
| --- | --- | --- |
| **Phishing Dataset** | MIT | [Hugging Face](https://huggingface.co/datasets/ealvaradob/phishing-dataset) |
| **Measuring Hate Speech** | CC-BY-4.0 | [Hugging Face](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech) |
| **Tweet Eval (SemEval-2019)** | [See Citation]* | [Hugging Face](https://huggingface.co/datasets/cardiffnlp/tweet_eval) |
| **Toxic Chat** | CC-BY-NC-4.0 | [Hugging Face](https://huggingface.co/datasets/lmsys/toxic-chat) |
| **Jigsaw Toxicity** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/tasksource/jigsaw_toxicity) |
| **Text Moderation Multilingual** | Apache-2.0 | [Hugging Face](https://huggingface.co/datasets/KoalaAI/Text-Moderation-Multilingual) |
---
### Citation: ucberkeley-dlab/measuring-hate-speech
```bibtex
@article
{kennedy2020constructing,
title={Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application},
author={Kennedy, Chris J and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia},
journal={arXiv preprint arXiv:2009.10277},
year={2020}
}
```
### Citation: cardiffnlp/tweet_eval
```bibtex
@inproceedings{basile-etal-2019-semeval,
title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter",
author = "Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel Pardo, Francisco Manuel and Rosso, Paolo and Sanguinetti, Manuela",
booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
year = "2019",
address = "Minneapolis, Minnesota, USA",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/S19-2007",
doi = "10.18653/v1/S19-2007",
pages = "54--63"
}
```
### Citation: lmsys/toxic-chat
```bibtex
@misc
{lin2023toxicchat,
title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation},
author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
year={2023},
eprint={2310.17389},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Citation: KoalaAI/Text-Moderation-Multilingual
```bibtex
@misc
{text-moderation-large,
title={Text-Moderation-Multilingual: A Multilingual Text Moderation Dataset},
author={[KoalaAI]},
year={2025},
note={Aggregated from ifmain's and OpenAI's moderation datasets}
}
```
|
@@ -1,3 +1,15 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
base_model: microsoft/deberta-v3-base
|
| 5 |
+
tags:
|
| 6 |
+
- text-classification
|
| 7 |
+
- deberta-v3
|
| 8 |
+
datasets:
|
| 9 |
+
- ealvaradob/phishing-dataset
|
| 10 |
+
- ucberkeley-dlab/measuring-hate-speech
|
| 11 |
+
- cardiffnlp/tweet_eval
|
| 12 |
+
- lmsys/toxic-chat
|
| 13 |
+
- tasksource/jigsaw_toxicity
|
| 14 |
+
- KoalaAI/Text-Moderation-Multilingual
|
| 15 |
+
---
|