| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - HiTZ/CONAN-EUS |
| | language: |
| | - eu |
| | metrics: |
| | - bleu |
| | library_name: transformers |
| | pipeline_tag: text2text-generation |
| | tags: |
| | - counternarrative |
| | - hate speech |
| | - text generation |
| | --- |
| | **Content Warning**: This card may contain examples of offensive language that do not reflect the authors’ views |
| |
|
| | # Model Card for mT5-counternarrative-es |
| |
|
| | This is a fine-tuned text-to-text [mT5-base](https://huggingface.co/google/mt5-base) model to generate counternarratives against hate speech in **Basque**. |
| | The model has been fine-tuned on the Basque splits of the [CONAN-EUS](https://huggingface.co/datasets/HiTZ/CONAN-EUS) dataset. |
| |
|
| | [CONAN-EUS](https://huggingface.co/datasets/HiTZ/CONAN-EUS) was created by professionally translating all 6654 English HS-CN pairs of the original CONAN dataset into |
| | **Basque and Spanish**. For experimentation we generated train, validation and test splits in a way that no HS-CN pairs occurred across them. |
| |
|
| | <table style="width:33%"> |
| | <tr> |
| | <th>CONAN-EUS Splits</th> |
| | <th>Total HS-CN Count</th> |
| | <tr> |
| | <td>train</td> |
| | <td>4833</td> |
| | </tr> |
| | <tr> |
| | <td>validation</td> |
| | <td>537</td> |
| | </tr> |
| | <tr> |
| | <td>test</td> |
| | <td>1278</td> |
| | </tr> |
| | </table> |
| | |
| | - 📖 Paper: [Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation](https://arxiv.org/abs/2403.09159) In LREC-COLING 2024. |
| | - 💻 Github Repo (Data and Code): [https://github.com/ixa-ehu/conan-e/](https://github.com/ixa-ehu/conan-e/) |
| |
|
| |
|
| | ## HS-CN example |
| |
|
| | The CONAN dataset includes HS-CN pairs, regarding the topic of Muslims and Islam. |
| | An example of a HS-CN pair in Basque, Spanish and English is illustrated below: |
| |
|
| | | HS | CN | |
| | |-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------| |
| | | Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. | Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra. | |
| | | Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura.| ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra.| |
| | | Muslims do not have anything useful that can enrich our culture.| What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra.| |
| |
|
| | If you use the model please **cite the following paper**: |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @inproceedings{bengoetxea-et-al-2024, |
| | title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation}, |
| | author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri}, |
| | year={2024}, |
| | publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)", |
| | } |
| | ``` |
| |
|
| |
|
| | **Contact**: [Rodrigo Agerri](https://ragerri.github.io/) |
| | HiTZ Center - Ixa, University of the Basque Country UPV/EHU |