File size: 2,981 Bytes
0f6b62c 1453a16 0f6b62c 1453a16 0f6b62c 1453a16 8fd7aed 3ed7fd4 8fd7aed 1453a16 8fd7aed 9c5cd57 1163ea0 9c5cd57 1453a16 8fd7aed 1453a16 a3e9deb 1453a16 75f5ce2 1453a16 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
---
title: README
emoji: π
colorFrom: green
colorTo: indigo
sdk: static
pinned: false
license: openrail++
short_description: 'TextDetox: detoxification, toxicity detection, explanation'
---
# Multilingual Text Detoxification with Parallel Data
Text Detoxification, toxicity detection and explanation for **diverse languages**: English, Spanish, German, French, Italian, Chinese, Japanese, Arabic, Hebrew, Hindi, Ukrainian, Russian, Tatar, Amharic. By many researchers from all over the world π
Support for better, safe, and multicultural online spaces.
π° [Read about the project in press](https://toloka.ai/blog/can-llms-eliminate-toxicity-in-human-and-ai-generated-content-what-multilingual-research-shows/)
πΉ [PyData&CPyConf Berlin 2023 talk](https://youtu.be/8I5tZvcmIis?si=y4sLgrW2xfwJC_GP)
**[2025] !!!NOW OPEN!!! TextDetox CLEF2025 shared task** [website](https://pan.webis.de/clef25/pan25-web/text-detoxification.html) π€[Starter Kit](https://huggingface.co/collections/textdetox/textdetox-2025-starter-kit-67dc3a8fd86111cac961ecc8)
**[2025] COLNG2025**: Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Alekhseevich Moskovskiy, Elisei Stakovskii, Eran Kaufman, Ashraf Elnagar, Animesh Mukherjee, and Alexander Panchenko. 2025. ***Multilingual and Explainable Text Detoxification with Parallel Corpora***. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7998β8025, Abu Dhabi, UAE. Association for Computational Linguistics. [pdf](https://aclanthology.org/2025.coling-main.535/)
**[2024] TextDetox2024 Report**: Daryna Dementieva, Daniil Moskovskiy, Nikolay Babakov, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Dmitry Ustalov, Elisei Stakovskii, Alisa Smirnova, Ashraf Elnagar, Animesh Mukherjee, and Alexander Panchenko ***"Overview of the multilingual text detoxification task at pan 2024"*** Working Notes of CLEF (2024). [pdf](https://ceur-ws.org/Vol-3740/paper-223.pdf)
**[2024] MultiParaDetox @ NAACL2024**: Daryna Dementieva, Nikolay Babakov, and Alexander Panchenko. ***"MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages."*** Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. [pdf](https://aclanthology.org/2024.naacl-short.12/)
**[2024] TextDetox CLEF2024 shared task** [website](https://pan.webis.de/clef24/pan24-web/text-detoxification.html)
**[2022] The first Parall Text Detoxification datasets**: [English ParaDetox](https://huggingface.co/datasets/s-nlp/paradetox) and [Russian ParaDetox](https://huggingface.co/datasets/s-nlp/ru_paradetox)
## Contact
We are happy to extend our research to more languages, cultures, and dimensions π
Please, contact: [Daryna Dementieva](https://huggingface.co/dardem) |