debajyotimaz
/

codemix_hate

@@ -19,10 +19,11 @@ language:
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model is for Hindi-English code-mixed hate detection.
-- **Developed by:** Debajyoti Mazumder, Aakash Kumar, Jasabanta Patro
 - **Model type:** Text Classification
 - **Language(s) :** Hindi-English code-mixed
 - **Parent Model:** See the [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) for more information about the model.
 ## How to Get Started with the Model
@@ -39,7 +40,26 @@ prediction= model(input_ids=inputs['input_ids'],attention_mask=inputs['attention
 print(prediction.logits)
 ```
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-We use the F1 score of positive class as the evaluation metric for training of our model because it takes into account the Acc, Pre and Rec values.

 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model is for Hindi-English code-mixed hate detection.
+- **Developed by:** Debajyoti Mazumder, Aakash Kumar
 - **Model type:** Text Classification
 - **Language(s) :** Hindi-English code-mixed
 - **Parent Model:** See the [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) for more information about the model.
+- **Paper:** [https://dl.acm.org/doi/full/10.1145/3726866](https://dl.acm.org/doi/full/10.1145/3726866)
 ## How to Get Started with the Model
 print(prediction.logits)
 ```
+## Citation
+```bibtex
+@article{10.1145/3726866,
+author = {Mazumder, Debajyoti and Kumar, Aakash and Patro, Jasabanta},
+title = {Improving Code-Mixed Hate Detection by Native Sample Mixing: A Case Study for Hindi-English Code-Mixed Scenario},
+year = {2025},
+issue_date = {May 2025},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+volume = {24},
+number = {5},
+issn = {2375-4699},
+url = {https://doi.org/10.1145/3726866},
+doi = {10.1145/3726866},
+abstract = {Hate detection has long been a challenging task for the NLP community. The task becomes complex in a code-mixed environment because the models must understand the context and the hate expressed through language alteration. Compared to the monolingual setup, we see much less work on code-mixed hate as large-scale annotated hate corpora are unavailable for the study. To overcome this bottleneck, we propose using native language hate samples (native language samples/ native samples hereafter). We hypothesise that in the era of multilingual language models (MLMs), hate in code-mixed settings can be detected by majorly relying on the native language samples. Even though the NLP literature reports the effectiveness of MLMs on hate detection in many cross-lingual settings, their extensive evaluation in a code-mixed scenario is yet to be done. This article attempts to fill this gap through rigorous empirical experiments. We considered the Hindi-English code-mixed setup as a case study as we have the linguistic expertise for the same. Some of the interesting observations we got are: (i) adding native hate samples in the code-mixed training set, even in small quantity, improved the performance of MLMs for code-mixed hate detection, (ii) MLMs trained with native samples alone observed to be detecting code-mixed hate to a large extent, (iii) the visualisation of attention scores revealed that, when native samples were included in training, MLMs could better focus on the hate emitting words in the code-mixed context, and (iv) finally, when hate is subjective or sarcastic, naively mixing native samples doesn’t help much to detect code-mixed hate. We have released the data and code repository to reproduce the reported results.1},
+journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.},
+month = apr,
+articleno = {47},
+numpages = {21},
+keywords = {Code-mixed hate detection, cross-lingual learning, native sample mixing}
+}
+```