Update README.md
Browse files
README.md
CHANGED
|
@@ -19,10 +19,11 @@ language:
|
|
| 19 |
|
| 20 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model is for Hindi-English code-mixed hate detection.
|
| 21 |
|
| 22 |
-
- **Developed by:** Debajyoti Mazumder, Aakash Kumar
|
| 23 |
- **Model type:** Text Classification
|
| 24 |
- **Language(s) :** Hindi-English code-mixed
|
| 25 |
- **Parent Model:** See the [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) for more information about the model.
|
|
|
|
| 26 |
|
| 27 |
## How to Get Started with the Model
|
| 28 |
|
|
@@ -39,7 +40,26 @@ prediction= model(input_ids=inputs['input_ids'],attention_mask=inputs['attention
|
|
| 39 |
print(prediction.logits)
|
| 40 |
```
|
| 41 |
|
| 42 |
-
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model is for Hindi-English code-mixed hate detection.
|
| 21 |
|
| 22 |
+
- **Developed by:** Debajyoti Mazumder, Aakash Kumar
|
| 23 |
- **Model type:** Text Classification
|
| 24 |
- **Language(s) :** Hindi-English code-mixed
|
| 25 |
- **Parent Model:** See the [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) for more information about the model.
|
| 26 |
+
- **Paper:** [https://dl.acm.org/doi/full/10.1145/3726866](https://dl.acm.org/doi/full/10.1145/3726866)
|
| 27 |
|
| 28 |
## How to Get Started with the Model
|
| 29 |
|
|
|
|
| 40 |
print(prediction.logits)
|
| 41 |
```
|
| 42 |
|
| 43 |
+
## Citation
|
| 44 |
|
| 45 |
+
```bibtex
|
| 46 |
+
@article{10.1145/3726866,
|
| 47 |
+
author = {Mazumder, Debajyoti and Kumar, Aakash and Patro, Jasabanta},
|
| 48 |
+
title = {Improving Code-Mixed Hate Detection by Native Sample Mixing: A Case Study for Hindi-English Code-Mixed Scenario},
|
| 49 |
+
year = {2025},
|
| 50 |
+
issue_date = {May 2025},
|
| 51 |
+
publisher = {Association for Computing Machinery},
|
| 52 |
+
address = {New York, NY, USA},
|
| 53 |
+
volume = {24},
|
| 54 |
+
number = {5},
|
| 55 |
+
issn = {2375-4699},
|
| 56 |
+
url = {https://doi.org/10.1145/3726866},
|
| 57 |
+
doi = {10.1145/3726866},
|
| 58 |
+
abstract = {Hate detection has long been a challenging task for the NLP community. The task becomes complex in a code-mixed environment because the models must understand the context and the hate expressed through language alteration. Compared to the monolingual setup, we see much less work on code-mixed hate as large-scale annotated hate corpora are unavailable for the study. To overcome this bottleneck, we propose using native language hate samples (native language samples/ native samples hereafter). We hypothesise that in the era of multilingual language models (MLMs), hate in code-mixed settings can be detected by majorly relying on the native language samples. Even though the NLP literature reports the effectiveness of MLMs on hate detection in many cross-lingual settings, their extensive evaluation in a code-mixed scenario is yet to be done. This article attempts to fill this gap through rigorous empirical experiments. We considered the Hindi-English code-mixed setup as a case study as we have the linguistic expertise for the same. Some of the interesting observations we got are: (i) adding native hate samples in the code-mixed training set, even in small quantity, improved the performance of MLMs for code-mixed hate detection, (ii) MLMs trained with native samples alone observed to be detecting code-mixed hate to a large extent, (iii) the visualisation of attention scores revealed that, when native samples were included in training, MLMs could better focus on the hate emitting words in the code-mixed context, and (iv) finally, when hate is subjective or sarcastic, naively mixing native samples doesn’t help much to detect code-mixed hate. We have released the data and code repository to reproduce the reported results.1},
|
| 59 |
+
journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.},
|
| 60 |
+
month = apr,
|
| 61 |
+
articleno = {47},
|
| 62 |
+
numpages = {21},
|
| 63 |
+
keywords = {Code-mixed hate detection, cross-lingual learning, native sample mixing}
|
| 64 |
+
}
|
| 65 |
+
```
|