PyTorch
bert
ilterm's picture
Update README.md
eebee54 verified
---
license: cc-by-nc-4.0
---
---
---
# CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish
This model performs **Cross-Individual Sentiment Analysis (CISA)** on historical Turkish texts (1900-1950), analyzing the **author's sentiment toward specific individuals** mentioned in the text, rather than the overall text sentiment.
## 🎯 Model Details
- **Model Name**: CISA-BERTurk-Sentiment
- **Base Model**: [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) (dbmdz/bert-base-turkish-cased)
- **Architecture**: DECA-EBSA (Dual-Encoder Context-Aware Entity-Based Sentiment Analysis)
- **Language**: Turkish
- **Period**: Historical Turkish texts (1900-1950)
- **Task**: Cross-Individual Sentiment Analysis
- **Classes**:
- 0: Negative
- 1: Neutral
- 2: Positive
## 🆚 CISA vs Standard Sentiment Analysis
### Example Comparison:
**Text**: *"Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti"* (Ali Bey's death filled us all with sadness)
| Analysis Type | Result | Explanation |
|--------------|--------|-------------|
| **Standard SA** | ❌ Negative | Overall text tone is sad |
| **CISA** | ✅ Positive | Author's respect/love for Ali Bey |
### CISA Advantages:
-**Person-focused** sentiment detection
-**Author perspective** analysis
-**Entity-based** precision
-**Context-aware** evaluation
## 📊 Performance Metrics
| Metric | Value |
|--------|-------|
| **Accuracy** | **87.08%** |
| **Precision** | **87.07%** |
| **Recall** | **87.08%** |
| **F1-Score** | **87.05%** |
## 📈 Dataset Information
- **Total Texts**: 7,816
- **Total Entities**: 9,249
- **Average Entities per Text**: 1.18
- **Sentiment Distribution**:
- Negative: 2,357 (25.5%)
- Neutral: 3,563 (38.5%)
- Positive: 3,329 (36.0%)
## 🚀 Usage
**Note**: This model uses a complex DECA-EBSA architecture with enhanced attention mechanisms, Turkish linguistic features, and contextual encoding. The full implementation requires the complete model architecture from the training code.
### Model Loading
```python
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("dbbiyte/CISA-BERTurk-sentiment")
# Download model weights
weights_path = hf_hub_download("dbbiyte/CISA-BERTurk-sentiment", "pytorch_model.bin")
print("Model weights downloaded successfully!")
print("For full CISA analysis, use the complete PositionAwareDualEncoderEBSA architecture from the training code.")
```
### Expected CISA Results
For the examples in our test set:
| Text | Entity | Standard SA | CISA Result |
|------|--------|-------------|-------------|
| "Ali Bey'in vefatı hepimizi hüzne boğmuştu" | Ali Bey | Negative | **Positive** |
| "Leyla Hanım'ın musiki resitalinde, nağmelerinin ruhuma işledi" | Leyla Hanım | Positive | **Positive** |
**CISA Key Insight**: The model analyzes the author's sentiment toward the mentioned person, not the overall text sentiment.
## 🏗️ DECA-EBSA Architecture
### Dual-Encoder Structure:
1. **Text Encoder**: Full text context processing
2. **Entity Encoder**: Entity + local context processing
### Key Features:
- **Enhanced Entity-Context Attention**: 12-head cross-attention
- **Position-Aware Modeling**: Entity position information
- **Turkish Linguistic Features**: Ottoman Turkish specific patterns
- **Context-Aware Classification**: Formal/informal distinction
- **Adaptive Focal Loss**: Focus on difficult examples
- **R-Drop Regularization**: Consistency enforcement
## 🔬 Research Contributions
### 1. Cross-Individual Sentiment Analysis (CISA)
- **First application** of CISA to historical Turkish
- **Author perspective** focused sentiment analysis
- **Entity-based approach** for person-specific emotions
### 2. DECA-EBSA Methodology
- **Dual-Encoder** architecture
- **Context-Aware** modeling
- **Entity-Based** attention mechanisms
### 3. Historical Turkish NLP Contributions
- **1900-1950 period** specialized dataset
- **Ottoman Turkish** linguistic features
- **Formal/informal** context distinction
## 👥 Authors
**İzmir Institute of Technology - Digital Humanities and AI Laboratory**:
- **Dr. Mustafa İLTER** - İzmir Institute of Technology
- **Dr. Doğan EVECEN** - İzmir Institute of Technology
- **Dr. Buket ERŞAHİN** - İzmir Institute of Technology
- **Dr. Yasemin ÖZCAN GÖNÜLAL** - İzmir Institute of Technology
- **Assoc. Prof.. Selma TEKİR** - İzmir Institute of Technology
**Pamukkale University**:
- **Assoc. Prof. Sezen KARABULUT** - Pamukkale University
- **İbrahim BERCİ** - Pamukkale University
- **Emre ONUÇ** - Pamukkale University
## 🏦 Funding & Acknowledgments
This work was supported by **The Scientific and Technological Research Council of Turkey (TÜBİTAK)** under project number **323K372**. We thank TÜBİTAK for their support.
## 📚 BERTurk Reference
This model uses [BERTurk](https://github.com/stefan-it/turkish-bert) developed by Stefan Schweter, a BERT model pre-trained on 35GB of Turkish text, optimized for Turkish natural language processing tasks.
## 📄 License and Usage Terms
This model is released under **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.
### ✅ Permitted Uses:
- **Academic research** (citation required)
- **Educational purposes**
- **Non-profit projects**
- **Personal experimental studies**
### ❌ Prohibited Uses:
- **Commercial applications**
- **Profit-driven projects**
- **Commercial product/service development**
### 📄 Citation Requirement:
When using this model, please cite as:
```bibtex
@misc{ilter2025cisa,
author = {İlter, Mustafa and Evecen, Doğan and Erşahin, Buket and Özcan Gönülal, Yasemin and Karabulut, Sezen and Berci, İbrahim and Onuç, Emre and Tekir, Selma},
title = {CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish},
howpublished = {Deep Learning Model},
publisher = {Hugging Face},
url = {https://huggingface.co/dbbiyte/CISA-BERTurk-sentiment},
doi = {10.57967/hf/6142},
year = {2025},
}
```
## 🚨 Limitations
- Model is optimized specifically for **1900-1950 period Turkish texts**
- Performance may vary on **modern Turkish texts**
- **Historical spelling conventions** and **archaic vocabulary** should be considered
- Maximum sequence length is **256 tokens**
## 🏷️ Model Tags
`turkish` `sentiment-analysis` `historical-texts` `entity-based` `cross-individual` `berturk` `bert` `1900-1950` `pytorch` `safetensors`