---
license: cc-by-nc-4.0
---
---
---

# CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish

This model performs **Cross-Individual Sentiment Analysis (CISA)** on historical Turkish texts (1900-1950), analyzing the **author's sentiment toward specific individuals** mentioned in the text, rather than the overall text sentiment.

## 🎯 Model Details

- **Model Name**: CISA-BERTurk-Sentiment
- **Base Model**: [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) (dbmdz/bert-base-turkish-cased)
- **Architecture**: DECA-EBSA (Dual-Encoder Context-Aware Entity-Based Sentiment Analysis)
- **Language**: Turkish
- **Period**: Historical Turkish texts (1900-1950)
- **Task**: Cross-Individual Sentiment Analysis
- **Classes**: 
  - 0: Negative
  - 1: Neutral
  - 2: Positive

## 🆚 CISA vs Standard Sentiment Analysis

### Example Comparison:
**Text**: *"Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti"* (Ali Bey's death filled us all with sadness)

| Analysis Type | Result | Explanation |
|--------------|--------|-------------|
| **Standard SA** | ❌ Negative | Overall text tone is sad |
| **CISA** | ✅ Positive | Author's respect/love for Ali Bey |

### CISA Advantages:
- ✅ **Person-focused** sentiment detection
- ✅ **Author perspective** analysis
- ✅ **Entity-based** precision
- ✅ **Context-aware** evaluation

## 📊 Performance Metrics

| Metric | Value |
|--------|-------|
| **Accuracy** | **87.08%** |
| **Precision** | **87.07%** |
| **Recall** | **87.08%** |
| **F1-Score** | **87.05%** |

## 📈 Dataset Information

- **Total Texts**: 7,816
- **Total Entities**: 9,249
- **Average Entities per Text**: 1.18
- **Sentiment Distribution**:
  - Negative: 2,357 (25.5%)
  - Neutral: 3,563 (38.5%)
  - Positive: 3,329 (36.0%)

## 🚀 Usage

**Note**: This model uses a complex DECA-EBSA architecture with enhanced attention mechanisms, Turkish linguistic features, and contextual encoding. The full implementation requires the complete model architecture from the training code.

### Model Loading
```python
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("dbbiyte/CISA-BERTurk-sentiment")

# Download model weights
weights_path = hf_hub_download("dbbiyte/CISA-BERTurk-sentiment", "pytorch_model.bin")

print("Model weights downloaded successfully!")
print("For full CISA analysis, use the complete PositionAwareDualEncoderEBSA architecture from the training code.")
```

### Expected CISA Results
For the examples in our test set:

| Text | Entity | Standard SA | CISA Result |
|------|--------|-------------|-------------|
| "Ali Bey'in vefatı hepimizi hüzne boğmuştu" | Ali Bey | Negative | **Positive** |
| "Leyla Hanım'ın musiki resitalinde, nağmelerinin ruhuma işledi" | Leyla Hanım | Positive | **Positive** |

**CISA Key Insight**: The model analyzes the author's sentiment toward the mentioned person, not the overall text sentiment.

## 🏗️ DECA-EBSA Architecture

### Dual-Encoder Structure:
1. **Text Encoder**: Full text context processing
2. **Entity Encoder**: Entity + local context processing

### Key Features:
- **Enhanced Entity-Context Attention**: 12-head cross-attention
- **Position-Aware Modeling**: Entity position information
- **Turkish Linguistic Features**: Ottoman Turkish specific patterns
- **Context-Aware Classification**: Formal/informal distinction
- **Adaptive Focal Loss**: Focus on difficult examples
- **R-Drop Regularization**: Consistency enforcement

## 🔬 Research Contributions

### 1. Cross-Individual Sentiment Analysis (CISA)
- **First application** of CISA to historical Turkish
- **Author perspective** focused sentiment analysis
- **Entity-based approach** for person-specific emotions

### 2. DECA-EBSA Methodology
- **Dual-Encoder** architecture
- **Context-Aware** modeling
- **Entity-Based** attention mechanisms

### 3. Historical Turkish NLP Contributions
- **1900-1950 period** specialized dataset
- **Ottoman Turkish** linguistic features
- **Formal/informal** context distinction

## 👥 Authors

**İzmir Institute of Technology - Digital Humanities and AI Laboratory**:
- **Dr. Mustafa İLTER** - İzmir Institute of Technology
- **Dr. Doğan EVECEN** - İzmir Institute of Technology
- **Dr. Buket ERŞAHİN** - İzmir Institute of Technology
- **Dr. Yasemin ÖZCAN GÖNÜLAL** - İzmir Institute of Technology
- **Assoc. Prof.. Selma TEKİR** - İzmir Institute of Technology

**Pamukkale University**:
- **Assoc. Prof. Sezen KARABULUT** - Pamukkale University
- **İbrahim BERCİ** - Pamukkale University
- **Emre ONUÇ** - Pamukkale University

## 🏦 Funding & Acknowledgments

This work was supported by **The Scientific and Technological Research Council of Turkey (TÜBİTAK)** under project number **323K372**. We thank TÜBİTAK for their support.

## 📚 BERTurk Reference

This model uses [BERTurk](https://github.com/stefan-it/turkish-bert) developed by Stefan Schweter, a BERT model pre-trained on 35GB of Turkish text, optimized for Turkish natural language processing tasks.

## 📄 License and Usage Terms

This model is released under **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.

### ✅ Permitted Uses:
- **Academic research** (citation required)
- **Educational purposes**
- **Non-profit projects**
- **Personal experimental studies**

### ❌ Prohibited Uses:
- **Commercial applications**
- **Profit-driven projects**
- **Commercial product/service development**

### 📄 Citation Requirement:
When using this model, please cite as:

```bibtex
@misc{ilter2025cisa,
  author = {İlter, Mustafa and Evecen, Doğan and Erşahin, Buket and Özcan Gönülal, Yasemin and Karabulut, Sezen and Berci, İbrahim and Onuç, Emre and Tekir, Selma},
  title = {CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish},
  howpublished = {Deep Learning Model},
  publisher = {Hugging Face},
  url = {https://huggingface.co/dbbiyte/CISA-BERTurk-sentiment},
  doi = {10.57967/hf/6142},
  year = {2025},
}
```

## 🚨 Limitations

- Model is optimized specifically for **1900-1950 period Turkish texts**
- Performance may vary on **modern Turkish texts**
- **Historical spelling conventions** and **archaic vocabulary** should be considered
- Maximum sequence length is **256 tokens**

## 🏷️ Model Tags

`turkish` `sentiment-analysis` `historical-texts` `entity-based` `cross-individual` `berturk` `bert` `1900-1950` `pytorch` `safetensors`