PyTorch
bert
ilterm commited on
Commit
23558e1
·
verified ·
1 Parent(s): 76b505a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +177 -3
README.md CHANGED
@@ -1,3 +1,177 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ ---
5
+ ---
6
+
7
+ # CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish
8
+
9
+ This model performs **Cross-Individual Sentiment Analysis (CISA)** on historical Turkish texts (1900-1950), analyzing the **author's sentiment toward specific individuals** mentioned in the text, rather than the overall text sentiment.
10
+
11
+ ## 🎯 Model Details
12
+
13
+ - **Model Name**: CISA-BERTurk-Sentiment
14
+ - **Base Model**: [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) (dbmdz/bert-base-turkish-cased)
15
+ - **Architecture**: DECA-EBSA (Dual-Encoder Context-Aware Entity-Based Sentiment Analysis)
16
+ - **Language**: Turkish
17
+ - **Period**: Historical Turkish texts (1900-1950)
18
+ - **Task**: Cross-Individual Sentiment Analysis
19
+ - **Classes**:
20
+ - 0: Negative
21
+ - 1: Neutral
22
+ - 2: Positive
23
+
24
+ ## 🆚 CISA vs Standard Sentiment Analysis
25
+
26
+ ### Example Comparison:
27
+ **Text**: *"Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti"* (Ali Bey's death filled us all with sadness)
28
+
29
+ | Analysis Type | Result | Explanation |
30
+ |--------------|--------|-------------|
31
+ | **Standard SA** | ❌ Negative | Overall text tone is sad |
32
+ | **CISA** | ✅ Positive | Author's respect/love for Ali Bey |
33
+
34
+ ### CISA Advantages:
35
+ - ✅ **Person-focused** sentiment detection
36
+ - ✅ **Author perspective** analysis
37
+ - ✅ **Entity-based** precision
38
+ - ✅ **Context-aware** evaluation
39
+
40
+ ## 📊 Performance Metrics
41
+
42
+ | Metric | Value |
43
+ |--------|-------|
44
+ | **Accuracy** | **87.08%** |
45
+ | **Precision** | **87.07%** |
46
+ | **Recall** | **87.08%** |
47
+ | **F1-Score** | **87.05%** |
48
+
49
+ ## 📈 Dataset Information
50
+
51
+ - **Total Texts**: 7,816
52
+ - **Total Entities**: 9,249
53
+ - **Average Entities per Text**: 1.18
54
+ - **Sentiment Distribution**:
55
+ - Negative: 2,357 (25.5%)
56
+ - Neutral: 3,563 (38.5%)
57
+ - Positive: 3,329 (36.0%)
58
+
59
+ ## 🚀 Usage
60
+
61
+ **Note**: This model uses a complex DECA-EBSA architecture with enhanced attention mechanisms, Turkish linguistic features, and contextual encoding. The full implementation requires the complete model architecture from the training code.
62
+
63
+ ### Model Loading
64
+ ```python
65
+ from transformers import AutoTokenizer
66
+ from huggingface_hub import hf_hub_download
67
+
68
+ # Load tokenizer
69
+ tokenizer = AutoTokenizer.from_pretrained("dbbiyte/CISA-BERTurk-sentiment")
70
+
71
+ # Download model weights
72
+ weights_path = hf_hub_download("dbbiyte/CISA-BERTurk-sentiment", "pytorch_model.bin")
73
+
74
+ print("Model weights downloaded successfully!")
75
+ print("For full CISA analysis, use the complete PositionAwareDualEncoderEBSA architecture from the training code.")
76
+ ```
77
+
78
+ ### Expected CISA Results
79
+ For the examples in our test set:
80
+
81
+ | Text | Entity | Standard SA | CISA Result |
82
+ |------|--------|-------------|-------------|
83
+ | "Ali Bey'in vefatı hepimizi hüzne boğmuştu" | Ali Bey | Negative | **Positive** |
84
+ | "Leyla Hanım'ın musiki resitalinde, nağmelerinin ruhuma işledi" | Leyla Hanım | Positive | **Positive** |
85
+
86
+ **CISA Key Insight**: The model analyzes the author's sentiment toward the mentioned person, not the overall text sentiment.
87
+
88
+ ## 🏗️ DECA-EBSA Architecture
89
+
90
+ ### Dual-Encoder Structure:
91
+ 1. **Text Encoder**: Full text context processing
92
+ 2. **Entity Encoder**: Entity + local context processing
93
+
94
+ ### Key Features:
95
+ - **Enhanced Entity-Context Attention**: 12-head cross-attention
96
+ - **Position-Aware Modeling**: Entity position information
97
+ - **Turkish Linguistic Features**: Ottoman Turkish specific patterns
98
+ - **Context-Aware Classification**: Formal/informal distinction
99
+ - **Adaptive Focal Loss**: Focus on difficult examples
100
+ - **R-Drop Regularization**: Consistency enforcement
101
+
102
+ ## 🔬 Research Contributions
103
+
104
+ ### 1. Cross-Individual Sentiment Analysis (CISA)
105
+ - **First application** of CISA to historical Turkish
106
+ - **Author perspective** focused sentiment analysis
107
+ - **Entity-based approach** for person-specific emotions
108
+
109
+ ### 2. DECA-EBSA Methodology
110
+ - **Dual-Encoder** architecture
111
+ - **Context-Aware** modeling
112
+ - **Entity-Based** attention mechanisms
113
+
114
+ ### 3. Historical Turkish NLP Contributions
115
+ - **1900-1950 period** specialized dataset
116
+ - **Ottoman Turkish** linguistic features
117
+ - **Formal/informal** context distinction
118
+
119
+ ## 👥 Authors
120
+
121
+ **İzmir Institute of Technology - Digital Humanities and AI Laboratory**:
122
+ - **Dr. Mustafa İLTER** - İzmir Institute of Technology
123
+ - **Dr. Doğan EVECEN** - İzmir Institute of Technology
124
+ - **Dr. Buket ERŞAHİN** - İzmir Institute of Technology
125
+ - **Dr. Yasemin ÖZCAN GÖNÜLAL** - İzmir Institute of Technology
126
+ - **Assoc. Prof.. Selma TEKİR** - İzmir Institute of Technology
127
+
128
+ **Pamukkale University**:
129
+ - **Assoc. Prof. Sezen KARABULUT** - Pamukkale University
130
+ - **İbrahim BERCİ** - Pamukkale University
131
+ - **Emre ONUÇ** - Pamukkale University
132
+
133
+ ## 🏦 Funding & Acknowledgments
134
+
135
+ This work was supported by **The Scientific and Technological Research Council of Turkey (TÜBİTAK)** under project number **323K372**. We thank TÜBİTAK for their support.
136
+
137
+ ## 📚 BERTurk Reference
138
+
139
+ This model uses [BERTurk](https://github.com/stefan-it/turkish-bert) developed by Stefan Schweter, a BERT model pre-trained on 35GB of Turkish text, optimized for Turkish natural language processing tasks.
140
+
141
+ ## 📄 License and Usage Terms
142
+
143
+ This model is released under **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.
144
+
145
+ ### ✅ Permitted Uses:
146
+ - **Academic research** (citation required)
147
+ - **Educational purposes**
148
+ - **Non-profit projects**
149
+ - **Personal experimental studies**
150
+
151
+ ### ❌ Prohibited Uses:
152
+ - **Commercial applications**
153
+ - **Profit-driven projects**
154
+ - **Commercial product/service development**
155
+
156
+ ### 📄 Citation Requirement:
157
+ When using this model, please cite as:
158
+
159
+ ```bibtex
160
+ @software{ilter2025cisa,
161
+ author = {İlter, Mustafa and Evecen, Doğan and Erşahin, Buket and Özcan Gönülal, Yasemin and Karabulut, Sezen and Berci, İbrahim and Onuç, Emre and Tekir, Selma},
162
+ title = {CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish},
163
+ url = {https://huggingface.co/dbbiyte/CISA-BERTurk-sentiment},
164
+ year = {2025},
165
+ }
166
+ ```
167
+
168
+ ## 🚨 Limitations
169
+
170
+ - Model is optimized specifically for **1900-1950 period Turkish texts**
171
+ - Performance may vary on **modern Turkish texts**
172
+ - **Historical spelling conventions** and **archaic vocabulary** should be considered
173
+ - Maximum sequence length is **256 tokens**
174
+
175
+ ## 🏷️ Model Tags
176
+
177
+ `turkish` `sentiment-analysis` `historical-texts` `entity-based` `cross-individual` `berturk` `bert` `1900-1950` `pytorch` `safetensors`