jpwahle commited on
Commit
df768d6
Β·
verified Β·
1 Parent(s): c2121b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -1
README.md CHANGED
@@ -7,4 +7,71 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # BRIGHTER Dataset Organization
11
+
12
+ ## 🌍 BRIdging the Gap in Human-Annotated Textual Emotion Recognition
13
+
14
+ Welcome to the official Hugging Face organization for **BRIGHTER** - a multilingual emotion recognition dataset collection spanning 28 languages from 7 distinct language families.
15
+
16
+ ### πŸ“Š Overview
17
+
18
+ BRIGHTER addresses the critical gap in emotion recognition resources for low-resource languages, particularly those spoken in Africa, Asia, and Latin America. Our dataset provides human-annotated emotion labels across diverse linguistic landscapes, enabling more inclusive and representative emotion AI systems.
19
+
20
+ ### 🎯 Key Features
21
+
22
+ - **28 Languages**: Comprehensive coverage including many low-resource languages
23
+ - **7 Language Families**: Diverse linguistic representation
24
+ - **Human-Annotated**: High-quality annotations for reliable emotion recognition
25
+ - **Research-Ready**: Standardized format for easy integration into ML pipelines
26
+
27
+ ### πŸ“š Datasets Available
28
+
29
+ Browse our collection of emotion-annotated datasets across multiple languages. Each dataset includes:
30
+ - Text samples with emotion labels
31
+ - Language-specific preprocessing
32
+ - Train/validation/test splits
33
+ - Detailed documentation
34
+
35
+ ### πŸ”¬ Research Findings
36
+
37
+ Our research demonstrates important insights for multilingual emotion recognition:
38
+ - **Language-Specific Prompting**: Models show varying performance when prompted in English vs. target languages
39
+ - **Few-Shot Learning**: Performance improves consistently with increased examples
40
+ - **Prompt Sensitivity**: Different prompt formulations significantly impact model performance
41
+
42
+ ### πŸ“– Citation
43
+
44
+ If you use our datasets, please cite our papers:
45
+
46
+ ```bibtex
47
+ @article{muhammad2025brighterbridginggaphumanannotated,
48
+ title = {BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages},
49
+ author = {Muhammad, Shamsuddeen Hassan and Ousidhoum, Nedjma and Abdulmumin, Idris and Wahle, Jan Philip and Ruas, Terry and Beloucif, Meriem and de Kock, Christine and Surange, Nirmal and Teodorescu, Daniela and Ahmad, Ibrahim Said and Adelani, David Ifeoluwa and Aji, Alham Fikri and Ali, Felermino D. M. A. and Alimova, Ilseyar and Araujo, Vladimir and Babakov, Nikolay and Baes, Naomi and Bucur, Ana-Maria and Bukula, Andiswa and Cao, Guanqun and Cardenas, Rodrigo Tufino and Chevi, Rendi and Chukwuneke, Chiamaka Ijeoma and Ciobotaru, Alexandra and Dementieva, Daryna and Gadanya, Murja Sani and Geislinger, Robert and Gipp, Bela and Hourrane, Oumaima and Ignat, Oana and Lawan, Falalu Ibrahim and Mabuya, Rooweither and Mahendra, Rahmad and Marivate, Vukosi and Piper, Andrew and Panchenko, Alexander and Porto Ferreira, Charles Henrique and Protasov, Vitaly and Rutunda, Samuel and Shrivastava, Manish and Udrea, Aura Cristina and Wanzare, Lilian Diana Awuor and Wu, Sophie and Wunderlich, Florian Valentin and Zhafran, Hanif Muhammad and Zhang, Tianhui and Zhou, Yi and Mohammad, Saif M.},
50
+ journal = {arXiv preprint arXiv:2502.11926},
51
+ year = {2025}
52
+ }
53
+
54
+ @inproceedings{muhammad-etal-2025-semeval,
55
+ title = "{S}em{E}val-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection",
56
+ author = "Muhammad, Shamsuddeen Hassan and Ousidhoum, Nedjma and Abdulmumin, Idris and Yimam, Seid Muhie and Wahle, Jan Philip and Ruas, Terry and Beloucif, Meriem and De Kock, Christine and Belay, Tadesse Destaw and Ahmad, Ibrahim Said and Surange, Nirmal and Teodorescu, Daniela and Adelani, David Ifeoluwa and Aji, Alham Fikri and Ali, Felermino and Araujo, Vladimir and Ayele, Abinew Ali and Ignat, Oana and Panchenko, Alexander and Zhou, Yi and Mohammad, Saif M.",
57
+ booktitle = "Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)",
58
+ month = july,
59
+ year = "2025",
60
+ address = "Vienna, Austria",
61
+ publisher = "Association for Computational Linguistics",
62
+ url = "",
63
+ doi = "",
64
+ pages = ""
65
+ }
66
+ ```
67
+
68
+ ### πŸ“§ Contact
69
+
70
+ - **Shamsuddeen Hassan Muhammad**: [s.muhammad@imperial.ac.uk](mailto:s.muhammad@imperial.ac.uk)
71
+ - **Nedjma Ousidhoum**: [OusidhoumN@cardiff.ac.uk](mailto:OusidhoumN@cardiff.ac.uk)
72
+
73
+ ### 🌐 Project Website
74
+
75
+ Visit our official project page for more information: [https://brighter-dataset.github.io/](https://brighter-dataset.github.io/)
76
+
77
+ *Equal contribution by lead authors.