| | --- |
| | library_name: peft |
| | base_model: meta-llama/Llama-2-7b-hf |
| | language: |
| | - en |
| | pipeline_tag: text-generation |
| | tags: |
| | - hate-speech |
| | - explanation-generation |
| | --- |
| | |
| | # Model Card for gllama-alarm-implicit-hate |
| |
|
| | **GLlama Alarm** is a suite of knowledge-Guided versions of Llama 2 instruction fine-tuned for non-binary abusive language detection and explanation generation tasks. |
| |
|
| |
|
| | ## Model Details |
| |
|
| | This version has been instruction fine-tuned on Implicit Hate Corpus for multi-class expressiveness detection and explanation generation (i.e., implicit hate speech, explicit hate speech, not hate) as well as on encyclopedic, commonsense and temporal linguistic knowledge. |
| |
|
| | ### Model Description |
| |
|
| | - **Developed by:** Chiara Di Bonaventura, Lucia Siciliani, Pierpaolo Basile |
| | - **Funded by:** The Alan Turing Institute, Fondazione FAIR |
| | - **Language:** English |
| | - **Finetuned from model:** meta-llama/Llama-2-7b-hf |
| |
|
| | ### Model Sources |
| |
|
| | - **Paper:** https://kclpure.kcl.ac.uk/ws/portalfiles/portal/316198577/2025_COLING_from_detection_to_explanation.pdf |
| | |
| | |
| | ## Uses |
| | |
| | **GLlama Alarm** is intended for research use in English, especially for NLP tasks in the domain of social media, which might contain offensive content. |
| | Our suite can be used to **detect different levels of offensiveness and expressiveness of abusive language** (e.g. offensive comments, implicit hate speech, which has proven to be hard for many LLMs) and to **generate structured textual explanations** entailing why the text contains abusive language. |
| | |
| | In any case, language models, including ours, can potentially be used for language generation in a harmful way. GLlama Alarm should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. |
| | |
| | |
| | ## Training Details |
| | |
| | **GLlama Alarm** builds on top of the foundational model Llama 2 (7B), which is an auto-regressive language model that uses an optimized transformer architecture. |
| | Llama 2 was trained on a mix of publicly available online data between January 2023 and July 2023. We select the base version of Llama 2, which has 7B parameters. |
| | We instruction-funed Llama 2 on the following datasets: HateXplain and Implicit Hate Corpus, separately. This version is the one instruction fine-tuned on Implicit Hate Corpus. |
| | These datasets contain publicly available data designed for hate speech detection, thus ensuring data privacy and protection. |
| | To instruction fine-tune Llama 2, we created knowledge-guided prompts following our paradigm. The template is shown in Table 9 of the paper. |
| | We instruction fine-tuned Llama 2 with 17k knowledge-guided prompts for HateXplain and Implicit Hate for 5 epochs, while setting the other parameters as suggested by [Taori et al., 2023](https://github.com/tatsu-lab/stanford_alpaca). |
| | |
| | |
| | ## Citation |
| | <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| | |
| | **BibTeX:** |
| | |
| | @inproceedings{dibonaventura2025gllama_alarm, |
| | |
| | title={From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research}, |
| | |
| | author={Di Bonaventura, Chiara and Siciliani, Lucia and Basile, Pierpaolo and Merono-Penuela, Albert and McGillivray, Barbara}, |
| | |
| | booktitle={Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025)}, |
| | |
| | year={2025} |
| | } |
| |
|
| |
|
| |
|
| | **APA:** |
| |
|
| | Di Bonaventura, C., Siciliani, L., Basile, P., Merono-Penuela, A., & McGillivray, B. 2025. From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research. |
| | In Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025). |
| |
|
| |
|
| | ## Model Card Contact |
| |
|
| | chiara.di_bonaventura@kcl.ac.uk |
| | |
| | ### Framework versions |
| | |
| | - PEFT 0.10.0 |