PyTorch
cisa_berturk
custom_code
File size: 8,577 Bytes
23558e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2645b3a
23558e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5bc23a5
 
 
 
 
 
 
23558e1
 
5bc23a5
23558e1
 
5bc23a5
23558e1
e5a3ae4
5bc23a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5a3ae4
5bc23a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23558e1
5bc23a5
 
 
23558e1
 
 
 
 
 
 
b03faf0
23558e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
318d33c
23558e1
318d33c
e552d1e
 
318d33c
eebee54
23558e1
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
---
license: cc-by-nc-4.0
---
---
---

# CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish

This model performs **Cross-Individual Sentiment Analysis (CISA)** on historical Turkish texts (1900-1950), analyzing the **author's sentiment toward specific individuals** mentioned in the text, rather than the overall text sentiment.

## 🎯 Model Details

- **Model Name**: CISA-BERTurk-Sentiment
- **Base Model**: [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) (dbmdz/bert-base-turkish-cased)
- **Architecture**: DECA-EBSA (Dual-Encoder Context-Aware Entity-Based Sentiment Analysis)
- **Language**: Turkish
- **Period**: Historical Turkish texts (1900-1950)
- **Task**: Cross-Individual Sentiment Analysis
- **Classes**: 
  - 0: Negative
  - 1: Neutral
  - 2: Positive

## 🆚 CISA vs Standard Sentiment Analysis

### Example Comparison:
**Text**: *"Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti, onunla müşterek mesaimiz mevcuttu."* (Ali Bey's death filled us all with sadness)

| Analysis Type | Result | Explanation |
|--------------|--------|-------------|
| **Standard SA** | ❌ Negative | Overall text tone is sad |
| **CISA** | ✅ Positive | Author's respect/love for Ali Bey |

### CISA Advantages:
-**Person-focused** sentiment detection
-**Author perspective** analysis
-**Entity-based** precision
-**Context-aware** evaluation

## 📊 Performance Metrics

| Metric | Value |
|--------|-------|
| **Accuracy** | **87.08%** |
| **Precision** | **87.07%** |
| **Recall** | **87.08%** |
| **F1-Score** | **87.05%** |

## 📈 Dataset Information

- **Total Texts**: 7,816
- **Total Entities**: 9,249
- **Average Entities per Text**: 1.18
- **Sentiment Distribution**:
  - Negative: 2,357 (25.5%)
  - Neutral: 3,563 (38.5%)
  - Positive: 3,329 (36.0%)

## 🚀 Usage

### Installation

```bash
pip install transformers torch huggingface_hub
```

### Quick Start (3 satır)

```python
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("dbbiyte/CISA-BERTurk-sentiment")
model     = AutoModel.from_pretrained("dbbiyte/CISA-BERTurk-sentiment", trust_remote_code=True)

text         = "Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti, onunla müşterek mesaimiz mevcuttu."
entity_text  = "Ali Bey"
entity_start = text.index(entity_text)          # 0
entity_end   = entity_start + len(entity_text)  # 7

result = model.predict(text, entity_text, entity_start, entity_end, tokenizer)

print(result["sentiment_label"])   # → "Positive"
print(result["sentiment_probs"])   # → [0.04, 0.11, 0.85]
```

### Batch / Pipeline Usage

```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("dbbiyte/CISA-BERTurk-sentiment")
model     = AutoModel.from_pretrained("dbbiyte/CISA-BERTurk-sentiment", trust_remote_code=True)
model.eval()

# Her örnek: (text, entity_text, entity_start, entity_end)
samples = [
    ("Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti, onunla müşterek mesaimiz mevcuttu.",
     "Ali Bey", 0, 7),
    ("Leyla Hanım'ın musiki resitalinde, nağmelerinin ruhuma işledi.",
     "Leyla Hanım", 0, 11),
    ("Paşa'nın emirleri hiçbir zaman yerinde değildi.",
     "Paşa", 0, 4),
]

for text, ent_text, ent_start, ent_end in samples:
    result = model.predict(text, ent_text, ent_start, ent_end, tokenizer)
    print(f"Entity : {ent_text}")
    print(f"Sentiment: {result['sentiment_label']} "
          f"(conf: {max(result['sentiment_probs']):.2f})")
    print()
```

### Output Format

```python
{
    "sentiment":       2,           # 0=Negative, 1=Neutral, 2=Positive
    "sentiment_label": "Positive",
    "sentiment_probs": [0.04, 0.11, 0.85],   # [neg, neu, pos]
    "relation":        1,           # 0=Indirect, 1=Direct
    "relation_probs":  [0.12, 0.88]
}
```

### GPU Inference

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model  = AutoModel.from_pretrained(
    "dbbiyte/CISA-BERTurk-sentiment",
    trust_remote_code=True
).to(device)

result = model.predict(text, entity_text, entity_start, entity_end, tokenizer, device=device)
```

> **Note**: `trust_remote_code=True` is required because this model uses a custom
> DECA-EBSA architecture (`modeling_cisa.py`) hosted in the repository.
> The code is fully auditable at `dbbiyte/CISA-BERTurk-sentiment/blob/main/modeling_cisa.py`.
```

### Expected CISA Results
For the examples in our test set:

| Text | Entity | Standard SA | CISA Result |
|------|--------|-------------|-------------|
| "Ali Bey'in vefatı hepimizi hüzne boğmuştu, onunla senelerce müşterek mesaimiz mevcuttu." | Ali Bey | Negative | **Positive** |
| "Leyla Hanım'ın musiki resitalinde, nağmelerinin ruhuma işledi" | Leyla Hanım | Positive | **Positive** |

**CISA Key Insight**: The model analyzes the author's sentiment toward the mentioned person, not the overall text sentiment.

## 🏗️ DECA-EBSA Architecture

### Dual-Encoder Structure:
1. **Text Encoder**: Full text context processing
2. **Entity Encoder**: Entity + local context processing

### Key Features:
- **Enhanced Entity-Context Attention**: 12-head cross-attention
- **Position-Aware Modeling**: Entity position information
- **Turkish Linguistic Features**: Ottoman Turkish specific patterns
- **Context-Aware Classification**: Formal/informal distinction
- **Adaptive Focal Loss**: Focus on difficult examples
- **R-Drop Regularization**: Consistency enforcement

## 🔬 Research Contributions

### 1. Cross-Individual Sentiment Analysis (CISA)
- **First application** of CISA to historical Turkish
- **Author perspective** focused sentiment analysis
- **Entity-based approach** for person-specific emotions

### 2. DECA-EBSA Methodology
- **Dual-Encoder** architecture
- **Context-Aware** modeling
- **Entity-Based** attention mechanisms

### 3. Historical Turkish NLP Contributions
- **1900-1950 period** specialized dataset
- **Ottoman Turkish** linguistic features
- **Formal/informal** context distinction

## 👥 Authors

**İzmir Institute of Technology - Digital Humanities and AI Laboratory**:
- **Dr. Mustafa İLTER** - İzmir Institute of Technology
- **Dr. Doğan EVECEN** - İzmir Institute of Technology
- **Dr. Buket ERŞAHİN** - İzmir Institute of Technology
- **Dr. Yasemin ÖZCAN GÖNÜLAL** - İzmir Institute of Technology
- **Assoc. Prof.. Selma TEKİR** - İzmir Institute of Technology

**Pamukkale University**:
- **Assoc. Prof. Sezen KARABULUT** - Pamukkale University
- **İbrahim BERCİ** - Pamukkale University
- **Emre ONUÇ** - Pamukkale University

## 🏦 Funding & Acknowledgments

This work was supported by **The Scientific and Technological Research Council of Turkey (TÜBİTAK)** under project number **323K372**. We thank TÜBİTAK for their support.

## 📚 BERTurk Reference

This model uses [BERTurk](https://github.com/stefan-it/turkish-bert) developed by Stefan Schweter, a BERT model pre-trained on 35GB of Turkish text, optimized for Turkish natural language processing tasks.

## 📄 License and Usage Terms

This model is released under **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.

### ✅ Permitted Uses:
- **Academic research** (citation required)
- **Educational purposes**
- **Non-profit projects**
- **Personal experimental studies**

### ❌ Prohibited Uses:
- **Commercial applications**
- **Profit-driven projects**
- **Commercial product/service development**

### 📄 Citation Requirement:
When using this model, please cite as:

```bibtex
@misc{ilter2025cisa,
  author = {İlter, Mustafa and Evecen, Doğan and Erşahin, Buket and Özcan Gönülal, Yasemin and Karabulut, Sezen and Berci, İbrahim and Onuç, Emre and Tekir, Selma},
  title = {CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish},
  howpublished = {Deep Learning Model},
  publisher = {Hugging Face},
  url = {https://huggingface.co/dbbiyte/CISA-BERTurk-sentiment},
  doi = {10.57967/hf/6142},
  year = {2025},
}
```

## 🚨 Limitations

- Model is optimized specifically for **1900-1950 period Turkish texts**
- Performance may vary on **modern Turkish texts**
- **Historical spelling conventions** and **archaic vocabulary** should be considered
- Maximum sequence length is **256 tokens**

## 🏷️ Model Tags

`turkish` `sentiment-analysis` `historical-texts` `entity-based` `cross-individual` `berturk` `bert` `1900-1950` `pytorch` `safetensors`