|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
--- |
|
|
--- |
|
|
--- |
|
|
|
|
|
# CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish |
|
|
|
|
|
This model performs **Cross-Individual Sentiment Analysis (CISA)** on historical Turkish texts (1900-1950), analyzing the **author's sentiment toward specific individuals** mentioned in the text, rather than the overall text sentiment. |
|
|
|
|
|
## 🎯 Model Details |
|
|
|
|
|
- **Model Name**: CISA-BERTurk-Sentiment |
|
|
- **Base Model**: [BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) (dbmdz/bert-base-turkish-cased) |
|
|
- **Architecture**: DECA-EBSA (Dual-Encoder Context-Aware Entity-Based Sentiment Analysis) |
|
|
- **Language**: Turkish |
|
|
- **Period**: Historical Turkish texts (1900-1950) |
|
|
- **Task**: Cross-Individual Sentiment Analysis |
|
|
- **Classes**: |
|
|
- 0: Negative |
|
|
- 1: Neutral |
|
|
- 2: Positive |
|
|
|
|
|
## 🆚 CISA vs Standard Sentiment Analysis |
|
|
|
|
|
### Example Comparison: |
|
|
**Text**: *"Ali Bey'in vefatı bizleri elem-i azîme sevk etmişti"* (Ali Bey's death filled us all with sadness) |
|
|
|
|
|
| Analysis Type | Result | Explanation | |
|
|
|--------------|--------|-------------| |
|
|
| **Standard SA** | ❌ Negative | Overall text tone is sad | |
|
|
| **CISA** | ✅ Positive | Author's respect/love for Ali Bey | |
|
|
|
|
|
### CISA Advantages: |
|
|
- ✅ **Person-focused** sentiment detection |
|
|
- ✅ **Author perspective** analysis |
|
|
- ✅ **Entity-based** precision |
|
|
- ✅ **Context-aware** evaluation |
|
|
|
|
|
## 📊 Performance Metrics |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Accuracy** | **87.08%** | |
|
|
| **Precision** | **87.07%** | |
|
|
| **Recall** | **87.08%** | |
|
|
| **F1-Score** | **87.05%** | |
|
|
|
|
|
## 📈 Dataset Information |
|
|
|
|
|
- **Total Texts**: 7,816 |
|
|
- **Total Entities**: 9,249 |
|
|
- **Average Entities per Text**: 1.18 |
|
|
- **Sentiment Distribution**: |
|
|
- Negative: 2,357 (25.5%) |
|
|
- Neutral: 3,563 (38.5%) |
|
|
- Positive: 3,329 (36.0%) |
|
|
|
|
|
## 🚀 Usage |
|
|
|
|
|
**Note**: This model uses a complex DECA-EBSA architecture with enhanced attention mechanisms, Turkish linguistic features, and contextual encoding. The full implementation requires the complete model architecture from the training code. |
|
|
|
|
|
### Model Loading |
|
|
```python |
|
|
from transformers import AutoTokenizer |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("dbbiyte/CISA-BERTurk-sentiment") |
|
|
|
|
|
# Download model weights |
|
|
weights_path = hf_hub_download("dbbiyte/CISA-BERTurk-sentiment", "pytorch_model.bin") |
|
|
|
|
|
print("Model weights downloaded successfully!") |
|
|
print("For full CISA analysis, use the complete PositionAwareDualEncoderEBSA architecture from the training code.") |
|
|
``` |
|
|
|
|
|
### Expected CISA Results |
|
|
For the examples in our test set: |
|
|
|
|
|
| Text | Entity | Standard SA | CISA Result | |
|
|
|------|--------|-------------|-------------| |
|
|
| "Ali Bey'in vefatı hepimizi hüzne boğmuştu" | Ali Bey | Negative | **Positive** | |
|
|
| "Leyla Hanım'ın musiki resitalinde, nağmelerinin ruhuma işledi" | Leyla Hanım | Positive | **Positive** | |
|
|
|
|
|
**CISA Key Insight**: The model analyzes the author's sentiment toward the mentioned person, not the overall text sentiment. |
|
|
|
|
|
## 🏗️ DECA-EBSA Architecture |
|
|
|
|
|
### Dual-Encoder Structure: |
|
|
1. **Text Encoder**: Full text context processing |
|
|
2. **Entity Encoder**: Entity + local context processing |
|
|
|
|
|
### Key Features: |
|
|
- **Enhanced Entity-Context Attention**: 12-head cross-attention |
|
|
- **Position-Aware Modeling**: Entity position information |
|
|
- **Turkish Linguistic Features**: Ottoman Turkish specific patterns |
|
|
- **Context-Aware Classification**: Formal/informal distinction |
|
|
- **Adaptive Focal Loss**: Focus on difficult examples |
|
|
- **R-Drop Regularization**: Consistency enforcement |
|
|
|
|
|
## 🔬 Research Contributions |
|
|
|
|
|
### 1. Cross-Individual Sentiment Analysis (CISA) |
|
|
- **First application** of CISA to historical Turkish |
|
|
- **Author perspective** focused sentiment analysis |
|
|
- **Entity-based approach** for person-specific emotions |
|
|
|
|
|
### 2. DECA-EBSA Methodology |
|
|
- **Dual-Encoder** architecture |
|
|
- **Context-Aware** modeling |
|
|
- **Entity-Based** attention mechanisms |
|
|
|
|
|
### 3. Historical Turkish NLP Contributions |
|
|
- **1900-1950 period** specialized dataset |
|
|
- **Ottoman Turkish** linguistic features |
|
|
- **Formal/informal** context distinction |
|
|
|
|
|
## 👥 Authors |
|
|
|
|
|
**İzmir Institute of Technology - Digital Humanities and AI Laboratory**: |
|
|
- **Dr. Mustafa İLTER** - İzmir Institute of Technology |
|
|
- **Dr. Doğan EVECEN** - İzmir Institute of Technology |
|
|
- **Dr. Buket ERŞAHİN** - İzmir Institute of Technology |
|
|
- **Dr. Yasemin ÖZCAN GÖNÜLAL** - İzmir Institute of Technology |
|
|
- **Assoc. Prof.. Selma TEKİR** - İzmir Institute of Technology |
|
|
|
|
|
**Pamukkale University**: |
|
|
- **Assoc. Prof. Sezen KARABULUT** - Pamukkale University |
|
|
- **İbrahim BERCİ** - Pamukkale University |
|
|
- **Emre ONUÇ** - Pamukkale University |
|
|
|
|
|
## 🏦 Funding & Acknowledgments |
|
|
|
|
|
This work was supported by **The Scientific and Technological Research Council of Turkey (TÜBİTAK)** under project number **323K372**. We thank TÜBİTAK for their support. |
|
|
|
|
|
## 📚 BERTurk Reference |
|
|
|
|
|
This model uses [BERTurk](https://github.com/stefan-it/turkish-bert) developed by Stefan Schweter, a BERT model pre-trained on 35GB of Turkish text, optimized for Turkish natural language processing tasks. |
|
|
|
|
|
## 📄 License and Usage Terms |
|
|
|
|
|
This model is released under **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license. |
|
|
|
|
|
### ✅ Permitted Uses: |
|
|
- **Academic research** (citation required) |
|
|
- **Educational purposes** |
|
|
- **Non-profit projects** |
|
|
- **Personal experimental studies** |
|
|
|
|
|
### ❌ Prohibited Uses: |
|
|
- **Commercial applications** |
|
|
- **Profit-driven projects** |
|
|
- **Commercial product/service development** |
|
|
|
|
|
### 📄 Citation Requirement: |
|
|
When using this model, please cite as: |
|
|
|
|
|
```bibtex |
|
|
@misc{ilter2025cisa, |
|
|
author = {İlter, Mustafa and Evecen, Doğan and Erşahin, Buket and Özcan Gönülal, Yasemin and Karabulut, Sezen and Berci, İbrahim and Onuç, Emre and Tekir, Selma}, |
|
|
title = {CISA-BERTurk-Sentiment: Cross-Individual Sentiment Analysis for Historical Turkish}, |
|
|
howpublished = {Deep Learning Model}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/dbbiyte/CISA-BERTurk-sentiment}, |
|
|
doi = {10.57967/hf/6142}, |
|
|
year = {2025}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## 🚨 Limitations |
|
|
|
|
|
- Model is optimized specifically for **1900-1950 period Turkish texts** |
|
|
- Performance may vary on **modern Turkish texts** |
|
|
- **Historical spelling conventions** and **archaic vocabulary** should be considered |
|
|
- Maximum sequence length is **256 tokens** |
|
|
|
|
|
## 🏷️ Model Tags |
|
|
|
|
|
`turkish` `sentiment-analysis` `historical-texts` `entity-based` `cross-individual` `berturk` `bert` `1900-1950` `pytorch` `safetensors` |